Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please provide performance metrics in the benchmarks #122

Closed
nickchomey opened this issue Feb 7, 2023 · 5 comments
Closed

Please provide performance metrics in the benchmarks #122

nickchomey opened this issue Feb 7, 2023 · 5 comments

Comments

@nickchomey
Copy link

nickchomey commented Feb 7, 2023

I'm impressed by the accuracy of Lingua as compared to even fasttext, but it would be very useful to also see performance metrics in the benchmarks to determine if that accuracy comes at a cost. Likewise it would be useful for comparing lingua's low and high accuracy modes.

@pemistahl
Copy link
Owner

In chapter 9.5 of the README it says:
Lingua's high detection accuracy comes at the cost of being noticeably slower than other language detectors.

The statistical models in Lingua are larger than those of similar libraries. So querying them takes more time.

There is a benchmark script in this repo which gives you a clue how performant the library is. You can run it locally with poetry:

poetry run python3 scripts/benchmark.py

@nickchomey
Copy link
Author

Thanks, I'll have to give that a try and share some rough results here. I do think it would be nice/useful to present such stats in the official benchmark comparisons as there's no way to know what "noticeably slower" means. I know that Fasttext and cld2 tend to be exceptionally fast, so perhaps noticeably slower is still quite acceptable. But if it's a difference of 0.001s vs 1s, then obviously that's a problem.

@datatalking
Copy link

@nickchomey I'm relatively new to this repo but it has more languages than the translation repo I have been using. Could help test and show an "output chart" or help craft then submit a PR for this, so I'm willing to collab with you to look at a few options to generate the stats.

@nickchomey
Copy link
Author

@datatalking this isn't a focus for me at the moment and probably won't be for at least a few months, so Im not able to collaborate on anything. But if you have time and desire to do so, that would be great!

@pemistahl
Copy link
Owner

Performance metrics are now provided in the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants