Skip to content

Releases: knuddelsgmbh/jtokkit

1.1.0

19 Jul 14:01
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 1.0.0...1.1.0

1.0.0

10 Feb 13:46
Compare
Choose a tag to compare

Features

  • Improved performance of the CL100k encoding by 5x
    • Thanks @paplorinc for the great work!
  • Added text-embedding-3-small and text-embedding-3-large to the ModelType enum

Breaking Changes

  • Due to the performance optimization, we now return a custom IntArrayList instead of a List<Integer> to prevent unnecessary boxing. The IntArrayList does not implement List and therefore is a breaking change. If you are missing any critical functionality from IntArrayList, please raise an issue.

Full Changelog: 0.6.1...1.0.0

0.6.1

03 Jul 08:37
Compare
Choose a tag to compare

Fixes

  • Added a workaround to prevent issue with regex compilation on Android devices

Full Changelog: 0.6.0...0.6.1

0.6.0

30 Jun 14:45
Compare
Choose a tag to compare

Features

  • Added GPT_3_5_TURBO_16k to the ModelType enum

Full Changelog: 0.5.1...0.6.0

0.5.1

26 Jun 08:42
Compare
Choose a tag to compare

Fixes

  • Fixed an issue resulting in wrong encodings for Unicode input . Thanks @VoidIsVoid for raising and fixing this issue 🙂

New Contributors

Full Changelog: 0.5.0...0.5.1

0.5.0

16 May 12:11
Compare
Choose a tag to compare

Features

  • Added a new EncodingRegistry that loads only the requested vocabularies lazily instead of loading all vocabularies eagerly at initialization. Thanks @blackdiz for raising this feature request and implementing it 😊

New Contributors

Full Changelog: 0.4.0...0.5.0

0.4.0

17 Apr 08:30
Compare
Choose a tag to compare

Features

  • Added two new methods to Encoding: encode(String, int) and encodeOrdinary(String, int). Both methods allow you to pass a maxTokens integer parameter that stops encoding after the given maximum amount of tokens is reached. Thanks @radosdesign for raising this feature request and implementing it 😊

Breaking Changes

  • The Encoding interface got two new methods: encode(String, int) and encodeOrdinary(String, int). If you implemented this interface yourself, you have to update your implementations when upgrading.

New Contributors

  • @radosdesign made their first contribution in #12

Full Changelog: 0.3.0...0.4.0

0.3.0

15 Apr 10:11
Compare
Choose a tag to compare

Features

  • Added gpt-4-32k to ModelType
  • Added ModelType#getMaxContextLength which returns the maximum context length the model allows. Note that this context length includes prompt tokens and, where applicable, completion tokens.

Breaking Changes

  • The name and encodingType property of ModelType were changed from public access to private. Migrate to modelType.getName() and modelType.getEncodingType() if you were previously using direct property access.

Full Changelog: 0.2.0...0.3.0

0.2.0

06 Apr 07:25
Compare
Choose a tag to compare

Features

  • Add encodeOrdinary and countTokensOrdinary methods to Encoding.
    • The existing encode and countTokens method currently throw an exception if a special token is encountered. This change introduced encodeOrdinary which simply encodes special tokens as if they were normal text.
  • Add getEncodingForModel(String) to EncodingRegistry to allow retrieving encodings for models by their string name.
  • It is now possible to call EncodingRegistry#getEncodingForModel(String) with a snapshot of a model, for example "gpt-4-0314" and receive the correct encoding.

Full Changelog: 0.1.0...0.2.0

0.1.0

20 Mar 21:47
Compare
Choose a tag to compare

⭐ Initial Release

  • Implementations for cl100k_base, p50k_base, p50k_edit, r50k_base