Releases: knuddelsgmbh/jtokkit
Releases · knuddelsgmbh/jtokkit
1.1.0
What's Changed
- increase gpt-3.5-turbo maxContextLength to 16k by @dafriz in #92
- add gpt-4-turbo model by @dafriz in #94
- feat: Implement o200k_base encoding and support gpt-4o by @chatanywhere in #99
- add o200k_base encoding to docs by @dafriz in #101
- add gpt-4o-mini model by @dafriz in #102
New Contributors
- @dafriz made their first contribution in #92
- @imsosleepy made their first contribution in #97
- @chatanywhere made their first contribution in #99
Full Changelog: 1.0.0...1.1.0
1.0.0
Features
- Improved performance of the CL100k encoding by 5x
- Thanks @paplorinc for the great work!
- Added
text-embedding-3-small
andtext-embedding-3-large
to theModelType
enum
Breaking Changes
- Due to the performance optimization, we now return a custom
IntArrayList
instead of aList<Integer>
to prevent unnecessary boxing. TheIntArrayList
does not implementList
and therefore is a breaking change. If you are missing any critical functionality fromIntArrayList
, please raise an issue.
Full Changelog: 0.6.1...1.0.0
0.6.1
Fixes
- Added a workaround to prevent issue with regex compilation on Android devices
Full Changelog: 0.6.0...0.6.1
0.6.0
0.5.1
Fixes
- Fixed an issue resulting in wrong encodings for Unicode input . Thanks @VoidIsVoid for raising and fixing this issue 🙂
New Contributors
- @VoidIsVoid made their first contribution in #34
Full Changelog: 0.5.0...0.5.1
0.5.0
Features
- Added a new
EncodingRegistry
that loads only the requested vocabularies lazily instead of loading all vocabularies eagerly at initialization. Thanks @blackdiz for raising this feature request and implementing it 😊
New Contributors
Full Changelog: 0.4.0...0.5.0
0.4.0
Features
- Added two new methods to
Encoding
:encode(String, int)
andencodeOrdinary(String, int)
. Both methods allow you to pass a maxTokens integer parameter that stops encoding after the given maximum amount of tokens is reached. Thanks @radosdesign for raising this feature request and implementing it 😊
Breaking Changes
- The
Encoding
interface got two new methods:encode(String, int)
andencodeOrdinary(String, int)
. If you implemented this interface yourself, you have to update your implementations when upgrading.
New Contributors
- @radosdesign made their first contribution in #12
Full Changelog: 0.3.0...0.4.0
0.3.0
Features
- Added gpt-4-32k to
ModelType
- Added
ModelType#getMaxContextLength
which returns the maximum context length the model allows. Note that this context length includes prompt tokens and, where applicable, completion tokens.
Breaking Changes
- The
name
andencodingType
property ofModelType
were changed from public access to private. Migrate tomodelType.getName()
andmodelType.getEncodingType()
if you were previously using direct property access.
Full Changelog: 0.2.0...0.3.0
0.2.0
Features
- Add encodeOrdinary and countTokensOrdinary methods to Encoding.
- The existing encode and countTokens method currently throw an exception if a special token is encountered. This change introduced encodeOrdinary which simply encodes special tokens as if they were normal text.
- Add getEncodingForModel(String) to EncodingRegistry to allow retrieving encodings for models by their string name.
- It is now possible to call EncodingRegistry#getEncodingForModel(String) with a snapshot of a model, for example "gpt-4-0314" and receive the correct encoding.
Full Changelog: 0.1.0...0.2.0