Reproducing Benchmark Results #21

mert-kurttutan · 2023-01-22T08:52:38Z

Hi,

I want to reproduce the result in presented in README.md, to the extent my hardware would allow. I am aware of the scripts/benchmark.py file and could run tiktoken with different number of threads. But when it comes to setting number of thread for huggingface tokenizers, I could not set it. I tried using environment variable RAYON_RS_NUM_CPUS, but the number of threads did not change.

Any help is appreciated!

The text was updated successfully, but these errors were encountered:

mert-kurttutan · 2023-01-22T16:56:21Z

It turns out the environment used for this setting change into RAYON_RS_NUM_THREADS. Now, it is working.

hauntsaninja · 2023-01-22T22:35:27Z

Yeah, I was using the environment variable RAYON_NUM_THREADS to control how many threads huggingface tokenizers used. For convenience in the benchmark, I also use the value of that environment variable to set num_threads

tiktoken/scripts/benchmark.py

Line 16 in cf385ca

num_threads = int(os.environ["RAYON_NUM_THREADS"])

tiktoken/scripts/benchmark.py

Line 24 in cf385ca

enc.encode_ordinary_batch(documents, num_threads=num_threads)

(Note that there is no environment variable that affects thread counts used in tiktoken proper / tiktoken does not use rayon)

hauntsaninja · 2023-01-22T22:36:15Z

Please let me know if you have any further difficulty reproducing benchmark numbers!

mert-kurttutan changed the title ~~Reproducing Results~~ Reproducing Benchmark Results Jan 22, 2023

mert-kurttutan closed this as completed Jan 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing Benchmark Results #21

Reproducing Benchmark Results #21

mert-kurttutan commented Jan 22, 2023

mert-kurttutan commented Jan 22, 2023

hauntsaninja commented Jan 22, 2023 •

edited

Loading

hauntsaninja commented Jan 22, 2023

Reproducing Benchmark Results #21

Reproducing Benchmark Results #21

Comments

mert-kurttutan commented Jan 22, 2023

mert-kurttutan commented Jan 22, 2023

hauntsaninja commented Jan 22, 2023 • edited Loading

hauntsaninja commented Jan 22, 2023

hauntsaninja commented Jan 22, 2023 •

edited

Loading