Tiktokenex

Pure Elixir BPE tokenizer compatible with OpenAI's tiktoken. No NIFs, no Python, no external dependencies.

Supports cl100k_base (GPT-4, GPT-3.5) and o200k_base (GPT-4o) encodings.

Usage

# Encode text to token IDs
Tiktokenex.encode("Hello, world!")
#=> [9906, 11, 1917, 0]

# Decode back to text
Tiktokenex.decode([9906, 11, 1917, 0])
#=> "Hello, world!"

# Count tokens
Tiktokenex.count("Hello, world!")
#=> 4

# See the BPE chunks
Tiktokenex.encode_to_chunks("Hello, world!")
#=> ["Hello", ",", " world", "!"]

# Use o200k_base encoding
Tiktokenex.encode("Hello", :o200k_base)

Installation

Add to your mix.exs as a git or path dependency:

def deps do
  [
    # git
    {:tiktokenex, git: "https://github.com/phiat/tiktokenex.git"},
    # …or a sibling working copy for development
    {:tiktokenex, path: "../tiktokenex"}
  ]
end

BPE rank files are not tracked in git — fetch them once with the bundled justfile recipe:

git clone https://github.com/phiat/tiktokenex.git
cd tiktokenex
just setup        # mix deps.get + downloads cl100k_base + o200k_base into priv/ranks/

Or download manually:

mkdir -p priv/ranks
curl -o priv/ranks/cl100k_base.tiktoken \
  https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken
curl -o priv/ranks/o200k_base.tiktoken \
  https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken

How It Works

Pre-tokenization (Pretokenizer) — splits text using tiktoken's regex patterns into coarse chunks
BPE encoding (BPE) — applies byte-pair encoding merges using rank tables
Rank loading (Ranks) — parses .tiktoken rank files, caches in persistent_term

The algorithm matches tiktoken's output exactly. See test/ for reference vectors.

API

Function	Description
`encode(text, encoding)`	Text to token ID list
`decode(ids, encoding)`	Token IDs back to text
`encode_to_chunks(text, encoding)`	Text to BPE chunk strings
`count(text, encoding)`	Token count

Default encoding is :cl100k_base. Pass :o200k_base as the second argument for GPT-4o tokenization.

Tests

just check    # mix test + credo + compile-with-warnings-as-errors
just test     # tests only

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
lib		lib
priv/ranks		priv/ranks
test		test
.formatter.exs		.formatter.exs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
justfile		justfile
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiktokenex

Usage

Installation

How It Works

API

Tests

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tiktokenex

Usage

Installation

How It Works

API

Tests

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages