Add support for fine-tuned models in encoding_for_model #135
+15
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
When trying to call
encoding_for_model
providing a fine-tuned model as input, the following error occurs:Analysis
See https://platform.openai.com/docs/models/model-endpoint-compatibility
See https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
The following models are allowed for fine-tuning:
All of them use the encoding
r50k_base
.Fine-tuned models names always follow this format:
model:ft-personal:name:date
where
model
is the base model from which the fine-tuned one has been createdft-personal
is a fixed string that tells that the model is fine-tunedname
is a custom name that the user can give to the new modeldate
is the date of fine-tuning in the formatyyyy-MM-dd-hh-mm-ss
Solutions
Map the models prefixes in
MODEL_PREFIX_TO_ENCODING
, so that when encoding_for_model callsmodel_name.startswith
, it can also identify all models starting with "davinci", "ada", etc... and, so, identify fine-tuned models.