Skip to content

juditacs/bpe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bpe

Byte pair encoding

Usage

Learn BPE units:

cat input.txt | python learn_bpe.py -u 10000 > bpe_units

Apply BPE:

cat input.txt | python apply_bpe.py bpe_units > input.bpe

apply_bpe has two matchings strategies: longest and shortest. Long finds the longest possible match, while shortest does the opposite. The separator can also be redefined with the -s switch.

About

Byte pair encoding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages