Skip to content

Commit

Permalink
Use compact string type to reduce memory consumption (#198)
Browse files Browse the repository at this point in the history
  • Loading branch information
pemistahl committed Jun 13, 2023
1 parent 3ba41b9 commit cd4882d
Show file tree
Hide file tree
Showing 7 changed files with 195 additions and 50 deletions.
29 changes: 29 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ required-features = ["benchmark"]

[dependencies]
brotli = "3.3.4"
compact_str = "0.7.0"
fraction = "0.13.1"
include_dir = "0.7.3"
itertools = "0.10.5"
Expand Down
26 changes: 16 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2654,10 +2654,10 @@ Whichlang has the shortest processing time, Lingua the longest.

| | **Single Thread** | **Multiple Threads** |
|--------------------------------------------------|-------------------|----------------------|
| _Lingua / high accuracy mode / all languages_ | 756.18 ms | 112.54 ms |
| _Lingua / high accuracy mode / common languages_ | 336.86 ms | 34.351 ms |
| _Lingua / low accuracy mode / all languages_ | 370.87 ms | 49.002 ms |
| _Lingua / low accuracy mode / common languages_ | 186.89 ms | 22.897 ms |
| _Lingua / high accuracy mode / all languages_ | 622.00 ms | 96.648 ms |
| _Lingua / high accuracy mode / common languages_ | 333.31 ms | 37.347 ms |
| _Lingua / low accuracy mode / all languages_ | 373.15 ms | 48.182 ms |
| _Lingua / low accuracy mode / common languages_ | 180.54 ms | 24.550 ms |
| _Whichlang_ | 2.0458 ms | 351.03 µs |
| _Whatlang / all languages_ | 113.08 ms | 12.992 ms |
| _Whatlang / common languages_ | 47.742 ms | 5.6070 ms |
Expand Down Expand Up @@ -2831,11 +2831,17 @@ value 1.0 will always be returned for this language. The other languages will re
There is also a method for returning the confidence value for one specific language only:

```rust
let confidence = detector.compute_language_confidence("languages are awesome", French);
println!("{:.2}", confidence);
use lingua::Language::{English, French, German, Spanish};
use lingua::LanguageDetectorBuilder;

// Output:
// 0.04
fn main() {
let languages = vec![English, French, German, Spanish];
let detector = LanguageDetectorBuilder::from_languages(&languages).build();
let confidence = detector.compute_language_confidence("languages are awesome", French);
let rounded_confidence = (confidence * 100.0).round() / 100.0;

assert_eq!(rounded_confidence, 0.04);
}
```

The value that this method computes is a number between 0.0 and 1.0.
Expand Down Expand Up @@ -2876,8 +2882,8 @@ of less than 120 characters will drop significantly. However, detection accuracy
texts which are longer than 120 characters will remain mostly unaffected.

In high accuracy mode (the default), the language detector consumes approximately
1,200 MB of memory if all language models are loaded. In low accuracy mode, memory
consumption is reduced to approximately 90 MB. The goal is to further reduce memory
970 MB of memory if all language models are loaded. In low accuracy mode, memory
consumption is reduced to approximately 72 MB. The goal is to further reduce memory
consumption in later releases.

An alternative for a smaller memory footprint and faster performance is to reduce the set
Expand Down
3 changes: 3 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@
- The language model files are now compressed with the Brotli algorithm which
reduces the file size by 15 %, on average. (#189)

- The language model ngrams are now stored in a `CompactString` type which
reduces the amount of consumed memory by 20 %. (#198)

- Several performance optimizations have been applied which makes the library
nearly twice as fast as the previous version. Big thanks go out to @serega
and @koute for their help. (#82, #148, #177)
Expand Down
Loading

0 comments on commit cd4882d

Please sign in to comment.