Skip to content

Highly specialized crate to parse and use `google/sentencepiece` 's precompiled_charsmap in `tokenizers`

License

Notifications You must be signed in to change notification settings

huggingface/spm_precompiled

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crate API

spm_precompiled

This crate aims to emulate https://github.com/google/sentencepiece Dart::DoubleArray struct and it's Normalizer. It's main intent is to be used with tokenizers that is a Rust library that aims to provide facilities to tokenize string for use with HuggingFace's transformers library

This crate is highly specialized and not intended for general use.

The core of the algorithm is to read spm's binary precompiled_charsmap.

About

Highly specialized crate to parse and use `google/sentencepiece` 's precompiled_charsmap in `tokenizers`

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages