Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile-Guided Optimization (PGO) benchmark results #1426

Closed
zamazan4ik opened this issue Jan 7, 2024 · 4 comments
Closed

Profile-Guided Optimization (PGO) benchmark results #1426

zamazan4ik opened this issue Jan 7, 2024 · 4 comments
Labels

Comments

@zamazan4ik
Copy link

zamazan4ik commented Jan 7, 2024

Hi!

Writing this for the history. Maybe these results will be interesting to someone who trying to achieve better performance with tokenizers since the project cares about performance.

I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are available here (with a lot of other PGO-related information). That's why I tried to optimize tokenizers with PGO too.

Test environment

I performed tests on my Linux-based machine.

Linux:

  • Fedora 39
  • Linux kernel 6.6.9
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.75
  • Tokenizers version: the latest for now from the main branch on commit f1c23b868006ee27acdd31796677f82fa10d6bd7
  • Disabled Turbo boost (for more stable results across runs)

Benchmarks

As a benchmark, I use built-in benchmarks with cargo bench -- --verbose command from the Makefile (if you want to reproduce my results - please check #1425 before). For the PGO optimization phase, I use cargo-pgo with cargo pgo optimize bench -- --verbose. For the PGO training phase, I use the same benchmark with cargo pgo bench -- --verbose.

Results

I got the following results:

As you see, in general, the Tokenizers' performance can be improved with PGO. I think this information can be written somewhere into the documentation, so users will be aware of PGO effects on the Tokenizers' performance and can decide to apply PGO for their Tokenizers' builds.

I already see some PGO mentions in the CI scripts but it's not clear - are Tokenizers packages PGO-optimized or not. As far as I can understand from the build scripts - they are not (but I could be wrong - please correct me in this case).

Please treat the issue just as a benchmark report - it's not an actual error, crash, or something like that.

@Narsil
Copy link
Collaborator

Narsil commented Jan 8, 2024

Thanks for opening this.

If I read correctly, the improvements are in the 5-10% range, correct ?
Overall that's nice, but those benchmarks are not really representative enough to be used currently.

The reason is that tokenizers is made super modular (in order to support many different kinds of tokenizers, pretty much all in ML). And performance is highly related to the combo choice of normalizers/pre_tokenizers/models. Therefore I wouldn't use PGO just yet.

If you care about tokenizer performance that bad (in ML it's now mostly negligible runtime since it's not Python anymore), I encourage you to look at : https://github.com/microsoft/BlingFire which claims even faster tokenization (fastest claim I'm aware of).
There are also other libraries out there which claim faster performance.

tokenizers being very general cannot be the fastest library compared to highly specialized code for a given tokenizer. In the real of LM though, it shouldn't matter that much anymore

@zamazan4ik
Copy link
Author

If I read correctly, the improvements are in the 5-10% range, correct ?

In general - yes, you are right. However, in some tests like "BPE GPT2 encode, no cache" improvements are up to 20%

but those benchmarks are not really representative enough to be used currently.

Hmm, it's interesting. What is the current purpose of these benchmarks?

Therefore I wouldn't use PGO just yet.

Fair point. Even if you don't want to integrate PGO into the Tokenizers build pipeline with some predefined PGO workload - that's completely fine, I understand the difficulty of this way. At least the numbers above could be interesting for the Tokenizers users who care about performance (and have no way/time/money) to switch to another tokenizer implementation. I hope the results are visible enough in this issue :)

Thanks a lot for the links to other tokenizers - I will try to optimize them with PGO as well.

@Narsil
Copy link
Collaborator

Narsil commented Jan 10, 2024

Hmm, it's interesting. What is the current purpose of these benchmarks?

Well to have an idea of how tokenizers works on a particularly useful task, not enough to guide PGO :)
And yes performance is most likely biased towards that particular tokenizer (It is in general biased towards space separated tokenizers, which are less and less used)

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants