Skip to content

legalforce-research/tokenizer-speed-bench

Repository files navigation

Benchmarking of various tokenizers

This repository contains benchmarking tools of various tokenizers.

Overview

To perform benchmarking, you have to run the following two steps.

Preparation

The following commands prepare resources (e.g. model data) and compile source codes.

% git submodule update --init
% ./download_resources.sh
% ./compile_all.sh

Measurement

To measure the speed of each tokenizer, run the following commands. If you stop ./run_all.sh in the middle, ./stats.py will calculate statistics from avaiable results.

% ./run_all.sh | tee ./results
% ./stats.py < ./results

Disclaimer

This software is developed by LegalForce, Inc., but not an officially supported LegalForce product.

License

Licensed under either of

at your option.

For softwares under thirdparty, follow the license terms of each software.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.