Skip to content

jaymody/bpe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A simplified implementation of OpenAI's BPE encoder for GPT-2.

The original implementation can be found in original.py, which was copied from here.

My re-implementation can be found in bpe.py. I simplified a lot of things, added type hints, and refactored everything to be functional (I use recursion for merging the pairs). This implementation is probably slower than the original.

You can test that this implementation gives identical outputs to the original when encoding some_text_file.txt via:

$ python test.py some_text_file.txt
✅ test passed (encode -> decode recovers input text)
✅ test passed (gives same output as original implementation)

Note, you'll need to install regex:

$ pip install regex

Tested with Python 3.9.6.

About

A simplified implementation of OpenAI's BPE encoder for GPT-2.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages