Skip to content

ianks/tokenizers-ruby

 
 

Repository files navigation

Tokenizers Ruby

🙂 Fast state-of-the-art tokenizers for Ruby

Build Status

Installation

Add this line to your application’s Gemfile:

gem "tokenizers"

Note: Rust is currently required for installation.

Getting Started

Load a pretrained tokenizer

tokenizer = Tokenizers.from_pretrained("bert-base-cased")

Encode

encoded = tokenizer.encode("I can feel the magic, can you?")
encoded.ids
encoded.tokens

Decode

tokenizer.decode(ids)

Load a tokenizer from files

tokenizer = Tokenizers::CharBPETokenizer.new("vocab.json", "merges.txt")

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/tokenizers-ruby.git
cd tokenizers-ruby
bundle install
bundle exec ruby ext/tokenizers/extconf.rb && make
bundle exec rake download:files
bundle exec rake test

About

Fast state-of-the-art tokenizers for Ruby

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 68.2%
  • Ruby 31.8%