Please provide performance metrics in the benchmarks #122

nickchomey · 2023-02-07T18:46:30Z

I'm impressed by the accuracy of Lingua as compared to even fasttext, but it would be very useful to also see performance metrics in the benchmarks to determine if that accuracy comes at a cost. Likewise it would be useful for comparing lingua's low and high accuracy modes.

pemistahl · 2023-02-08T08:09:39Z

In chapter 9.5 of the README it says:
Lingua's high detection accuracy comes at the cost of being noticeably slower than other language detectors.

The statistical models in Lingua are larger than those of similar libraries. So querying them takes more time.

There is a benchmark script in this repo which gives you a clue how performant the library is. You can run it locally with poetry:

poetry run python3 scripts/benchmark.py

nickchomey · 2023-02-08T11:04:44Z

Thanks, I'll have to give that a try and share some rough results here. I do think it would be nice/useful to present such stats in the official benchmark comparisons as there's no way to know what "noticeably slower" means. I know that Fasttext and cld2 tend to be exceptionally fast, so perhaps noticeably slower is still quite acceptable. But if it's a difference of 0.001s vs 1s, then obviously that's a problem.

datatalking · 2023-05-01T19:15:32Z

@nickchomey I'm relatively new to this repo but it has more languages than the translation repo I have been using. Could help test and show an "output chart" or help craft then submit a PR for this, so I'm willing to collab with you to look at a few options to generate the stats.

nickchomey · 2023-05-01T19:23:13Z

@datatalking this isn't a focus for me at the moment and probably won't be for at least a few months, so Im not able to collaborate on anything. But if you have time and desire to do so, that would be great!

pemistahl · 2023-12-19T13:55:16Z

Performance metrics are now provided in the README.

pemistahl closed this as completed Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please provide performance metrics in the benchmarks #122

Please provide performance metrics in the benchmarks #122

nickchomey commented Feb 7, 2023 •

edited

Loading

pemistahl commented Feb 8, 2023

nickchomey commented Feb 8, 2023

datatalking commented May 1, 2023

nickchomey commented May 1, 2023

pemistahl commented Dec 19, 2023

Please provide performance metrics in the benchmarks #122

Please provide performance metrics in the benchmarks #122

Comments

nickchomey commented Feb 7, 2023 • edited Loading

pemistahl commented Feb 8, 2023

nickchomey commented Feb 8, 2023

datatalking commented May 1, 2023

nickchomey commented May 1, 2023

pemistahl commented Dec 19, 2023

nickchomey commented Feb 7, 2023 •

edited

Loading