Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load models slightly more eagerly and reuse for all ngrams during detection. #82

Closed
wants to merge 1 commit into from

Conversation

serega
Copy link

@serega serega commented Jul 4, 2022

The library puts emphasis on lazy loading in order to be efficient in certain servless environments. My use-case requires maximum performance, and I can sacrify slower startup for better runtime performance. The LanguageDetectorBuilder has a flag with_preloaded_language_models that forces eager loading of models. However, during detection the LanguageDetector loops through every ngram and calls the function load_language_models, that has to take read lock to check whether the model is loaded. Read locks are cheap, but not free. As an experiment I completely eliminated lazy loading and stored models in the LanguageDetector. Single threaded benchmark improved by 1.2x-2x and multi-threaded by 2x-4x depending on the system.

Patch

This patch works with lazy loading design. Instead of lazy-loading model for every ngram, I load models slightly more eagerly in compute_sum_of_ngram_probabilities and in count_unigrams for the specified language, and reuse the models for ngrams in the loop. For this to work I had to use Arc instead of Box in the BoxedLanguageModel

Benchmark

For the benchmark I used the accuracy reports test data. The benchmark code is here. I tested both single-threaded and multi-threaded/parallel mode.

Results

I tested the patch on two machines

  • Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz on Linux
  • M1 Max on Mac

The numbers in the columns Before and After are throughput as detections per second.

Single threaded benchmark

cargo run --release --bin bench -- --max-examples 30000

Machine Before After Change
i7-6800K 1162 1294 1.11x
M1 Max 1592 1844 1.15x

Multi threaded benchmark

cargo run --release --bin bench -- --max-examples 50000 --parallel

Machine Before After Change
i7-6800K 4226 8033 1.9x
M1 Max 1383 5858 4.23x

I am really surprised by the numbers on M1 chip. Before, single-threaded benchmark runs faster than multi-threaded, which means RwLocks on M1-Mac are slow.

I ran accuracy reports and checked against the current main branch
diff -r accuracy-reports/lingua/ ../lingua-rs-main/accuracy-reports/lingua/
The command found no differences.

@pemistahl
Copy link
Owner

Hi @serega, thank you for this PR. This looks interesting. I'm quite busy currently but will definitely look into your changes as soon as I find the time.

@serega
Copy link
Author

serega commented Jan 19, 2023

Hi @pemistahl . I have two more relatively simple patches for performance improvements that cut processing time by another half. I use these patches in production, but I would like to use your original project. Wondering if I should submit my patches.

@pemistahl
Copy link
Owner

Hi @serega, I'm sorry for not having merged your changes yet. I was busy with other projects in the last months. But I plan to start working on the Rust implementation of Lingua again. I maintain four implementations of Lingua (Rust, Go, Kotlin, Python). The Go and Python ones are the newest as far as new features and improvements are concerned. The goal is to adapt those features and improvements to the Rust and Kotlin implementations as well.

Can you please update this PR with your new patches? I will then do a new review. Thank you.

@serega
Copy link
Author

serega commented Jan 21, 2023

Hi @pemistahl. I know about other implementations too. Thanks 👍 . We have a team using the Kotlin version, although I know they are having issues with memory usage.
Having multiple unrelated changes in the same pull request would be too challenging to review I think. #148 pull request contains one of two other patches I use.

@pemistahl
Copy link
Owner

I have now implemented this concept in c0f7e71, so this PR can be closed.

@pemistahl pemistahl closed this May 24, 2023
@pemistahl pemistahl added this to the Lingua 1.5.0 milestone Jun 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants