Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace GPT3Tokenizer with TikToken for token estimation #87

Merged
merged 6 commits into from
Nov 26, 2023

Conversation

gfargo
Copy link
Owner

@gfargo gfargo commented Nov 26, 2023

As the title indicates, this PR uses the TikToken lib for estimating the total size of the request. The GPT3Tokenizer is just that, and therefore limited in scope. TikToken offers support for any of the TikTokenModels available through OpenAI.


  • Fix tiktoken version in package.json: Lock tiktoken dependency version to 1.0.11 in package.json for consistency across environments.
  • Refactor handleResult and update handlers for commit and changelog: Update handleResult function to accept an object with a new optional interactiveHandler property. Apply this change to commit and changelog handlers for more flexible result handling in interactive mode.
  • Refactor getTokenCounter function and clean up types.ts: Improve readability and maintainability of getTokenCounter function and clean up unnecessary imports in types.ts.
  • Replace GPT3Tokenizer with TokenCounter in utils: Substitute GPT3Tokenizer with TokenCounter in collectDiffs.ts and summarizeDiffs.ts files. Update import statements and function calls in commit handler and changelog handler. Add tokenizer.ts and delete getTokenizer.ts in utils. Update types in lib.
  • Add utility functions for service configuration: Introduce utility functions for handling service configuration in utils.ts. Rename getModel function to getLlm and update it and getApiKeyForModel function to use new utility functions. Rename getChain function to getSummarizationChain. Update DEFAULT_CONFIG in constants.ts.
  • Update 'model' to 'service' in config and commands: Rename 'model' variable to 'service' in various files and objects. Update defaultConfig object across multiple test files and the loadGitConfig function. Update DEFAULT_CONFIG object and 'BaseCommandOptions' interface.
  • Update dependencies in package.json: Remove gpt3-tokenizer and add tiktoken in the dependencies.

This commit includes a series of changes in the `/src/lib/config/services`, `/src/lib/config`, and `/src/commands` directories. The main change is the renaming of the 'model' variable to 'service' in various files and objects. This includes the defaultConfig object across multiple test files and the loadGitConfig function. In addition, the DEFAULT_CONFIG object and the 'BaseCommandOptions' interface have been updated to reflect this change. New types related to the service provider and service model have also been added.
This update introduces three utility functions in `utils.ts` to handle service configuration. These functions are `getModelAndProviderFromService`, `getModelFromService`, and `getProviderFromService`. They improve the way the service configuration is parsed and used throughout the file.

The `getModel` function is renamed to `getLlm` and updated to use the new utility functions for better code readability and maintainability. The `getApiKeyForModel` function is also updated to utilize the new utility functions.

In addition, the `getChain` function is renamed to `getSummarizationChain` for clarity.

The `DEFAULT_CONFIG` in `constants.ts` is also updated, changing the default service from 'openai/gpt-4-32k' to 'openai/gpt-4'.
Replaced `GPT3Tokenizer` with `TokenCounter` in `collectDiffs.ts` and `summarizeDiffs.ts` files. This change affects the `collectDiffs`, `summarizeDirectoryDiff` and `SummarizeDiffsOptions` functions which now use `TokenCounter` for token counting.

---

Update import statements and function calls in commit handler

Removed `getTokenizer` and added `getLlm`, `getModelFromService`, and `getTokenCounter` in the import section of `handler.ts`. Also updated the function calls within the `handler` function to use these new imports.

---

Replace `getChain` with `getSummarizationChain` in default parser

In `index.ts`, replaced `getChain` with `getSummarizationChain` in the import statement and in the `fileChangeParser` function. Also renamed the `model` parameter to `llm` in the `options` object.

---

Update import and function calls in changelog handler

Replaced `getModel` with `getLlm` in the import statement and updated the `getApiKeyForModel` and `getLlm` function calls in the `handler` function to use `options.service` instead of `options.model`.

---

Add `tokenizer.ts` and delete `getTokenizer.ts` in utils

Deleted `getTokenizer.ts` and added `tokenizer.ts` which includes the `getTikToken`, `getTokenCount`, and `getTokenCounter` functions.

---

Update types in lib

In `types.ts`, replaced `getTokenizer` and `getModel` with `TokenCounter` and `getLlm` respectively in the `BaseParserOptions` interface.

---

Update dependencies in package.json

Removed `gpt3-tokenizer` and added `tiktoken` in the dependencies of `package.json`.
In `tokenizer.ts`, the `getTokenCounter` function has been refactored for better readability and maintainability. The function now directly returns a function that takes a string and returns the token count, instead of calling `getTokenCount`. This eliminates the need for the `getTokenCount` function, which has been removed.

In `types.ts`, unnecessary imports have been removed to clean up the file. The removed imports include `getTikToken` from `./utils/getTokenCounter` and `getTokenCounter` from `./utils/tokenizer`.
This commit includes an update to the `handleResult` function in `src/lib/ui/handleResult.ts` to accept an object input with a new optional `interactiveHandler` property. This allows for more flexible handling of results in interactive mode.

In `src/commands/commit/handler.ts`, the `handleResult` function has been updated to use the new `interactiveHandler` property. This change allows the commit handler to create a commit and log success when in interactive mode.

Similarly, in `src/commands/changelog/handler.ts`, the `handleResult` function has been updated to use the new `interactiveHandler` property. This change allows the changelog handler to log the result and log success when in interactive mode.
In the package.json, the `tiktoken` dependency version has been updated. Previously, it was set to automatically update with any new minor or patch versions. Now, it is locked to version 1.0.11 to ensure consistency across environments.
@gfargo gfargo merged commit fe2a3a9 into main Nov 26, 2023
5 checks passed
@gfargo gfargo deleted the chore/use-tiktoken-for-local-bpe-encoding branch November 26, 2023 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant