-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace GPT3Tokenizer
with TikToken
for token estimation
#87
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit includes a series of changes in the `/src/lib/config/services`, `/src/lib/config`, and `/src/commands` directories. The main change is the renaming of the 'model' variable to 'service' in various files and objects. This includes the defaultConfig object across multiple test files and the loadGitConfig function. In addition, the DEFAULT_CONFIG object and the 'BaseCommandOptions' interface have been updated to reflect this change. New types related to the service provider and service model have also been added.
This update introduces three utility functions in `utils.ts` to handle service configuration. These functions are `getModelAndProviderFromService`, `getModelFromService`, and `getProviderFromService`. They improve the way the service configuration is parsed and used throughout the file. The `getModel` function is renamed to `getLlm` and updated to use the new utility functions for better code readability and maintainability. The `getApiKeyForModel` function is also updated to utilize the new utility functions. In addition, the `getChain` function is renamed to `getSummarizationChain` for clarity. The `DEFAULT_CONFIG` in `constants.ts` is also updated, changing the default service from 'openai/gpt-4-32k' to 'openai/gpt-4'.
Replaced `GPT3Tokenizer` with `TokenCounter` in `collectDiffs.ts` and `summarizeDiffs.ts` files. This change affects the `collectDiffs`, `summarizeDirectoryDiff` and `SummarizeDiffsOptions` functions which now use `TokenCounter` for token counting. --- Update import statements and function calls in commit handler Removed `getTokenizer` and added `getLlm`, `getModelFromService`, and `getTokenCounter` in the import section of `handler.ts`. Also updated the function calls within the `handler` function to use these new imports. --- Replace `getChain` with `getSummarizationChain` in default parser In `index.ts`, replaced `getChain` with `getSummarizationChain` in the import statement and in the `fileChangeParser` function. Also renamed the `model` parameter to `llm` in the `options` object. --- Update import and function calls in changelog handler Replaced `getModel` with `getLlm` in the import statement and updated the `getApiKeyForModel` and `getLlm` function calls in the `handler` function to use `options.service` instead of `options.model`. --- Add `tokenizer.ts` and delete `getTokenizer.ts` in utils Deleted `getTokenizer.ts` and added `tokenizer.ts` which includes the `getTikToken`, `getTokenCount`, and `getTokenCounter` functions. --- Update types in lib In `types.ts`, replaced `getTokenizer` and `getModel` with `TokenCounter` and `getLlm` respectively in the `BaseParserOptions` interface. --- Update dependencies in package.json Removed `gpt3-tokenizer` and added `tiktoken` in the dependencies of `package.json`.
In `tokenizer.ts`, the `getTokenCounter` function has been refactored for better readability and maintainability. The function now directly returns a function that takes a string and returns the token count, instead of calling `getTokenCount`. This eliminates the need for the `getTokenCount` function, which has been removed. In `types.ts`, unnecessary imports have been removed to clean up the file. The removed imports include `getTikToken` from `./utils/getTokenCounter` and `getTokenCounter` from `./utils/tokenizer`.
This commit includes an update to the `handleResult` function in `src/lib/ui/handleResult.ts` to accept an object input with a new optional `interactiveHandler` property. This allows for more flexible handling of results in interactive mode. In `src/commands/commit/handler.ts`, the `handleResult` function has been updated to use the new `interactiveHandler` property. This change allows the commit handler to create a commit and log success when in interactive mode. Similarly, in `src/commands/changelog/handler.ts`, the `handleResult` function has been updated to use the new `interactiveHandler` property. This change allows the changelog handler to log the result and log success when in interactive mode.
In the package.json, the `tiktoken` dependency version has been updated. Previously, it was set to automatically update with any new minor or patch versions. Now, it is locked to version 1.0.11 to ensure consistency across environments.
1 task
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As the title indicates, this PR uses the
TikToken
lib for estimating the total size of the request. TheGPT3Tokenizer
is just that, and therefore limited in scope.TikToken
offers support for any of theTikTokenModel
s available through OpenAI.tiktoken
version in package.json: Locktiktoken
dependency version to 1.0.11 in package.json for consistency across environments.handleResult
and update handlers for commit and changelog: UpdatehandleResult
function to accept an object with a new optionalinteractiveHandler
property. Apply this change to commit and changelog handlers for more flexible result handling in interactive mode.getTokenCounter
function and clean uptypes.ts
: Improve readability and maintainability ofgetTokenCounter
function and clean up unnecessary imports intypes.ts
.GPT3Tokenizer
withTokenCounter
in utils: SubstituteGPT3Tokenizer
withTokenCounter
incollectDiffs.ts
andsummarizeDiffs.ts
files. Update import statements and function calls in commit handler and changelog handler. Addtokenizer.ts
and deletegetTokenizer.ts
in utils. Update types in lib.utils.ts
. RenamegetModel
function togetLlm
and update it andgetApiKeyForModel
function to use new utility functions. RenamegetChain
function togetSummarizationChain
. UpdateDEFAULT_CONFIG
inconstants.ts
.DEFAULT_CONFIG
object and 'BaseCommandOptions' interface.gpt3-tokenizer
and addtiktoken
in the dependencies.