Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 650 Bytes

README.md

File metadata and controls

22 lines (16 loc) · 650 Bytes

Run script/compile to build the Rust lib.

Enter the clojure directory and start a REPL with lein repl.

(def t (get-tokenizer "roberta-base-vocab.json" "roberta-base-merges.txt")) ;; this contains a pointer to a tokenizer
(time (get-tokens t "hello there"))
"Elapsed time: 0.074647 msecs"
["hell" "ot" "here"]

You can get the .json and .txt files here:

https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-vocab.json
https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-merges.txt

License

Copyright © 2020 Vladimir Mladenovic

Distributed under the EPL License, same as Clojure. See LICENSE.