Optimize memory footprint of resources #151

adrienball · 2019-09-09T16:34:23Z

Description

Use hashing trick for HashSetGazetteer, HashMapStemmer and HashMapWordClusterer and replace raw string keys by i32 keys
HashMapWordClusterer: try to load word clusters as u16 with a fallback to String for all values as soon as one value can't be converted to u16

In English, with the current word clusters included in the resources, this results in a constant 25MB gain in memory.
For other languages without word clusters, the expected gain is between 0.5MB and 1MB.

Backward compatibility
The new implementation is backward compatible. Old word clusters, which typically are stored like hierarchical binary paths of the form "10001011001", can still be loaded. In this case, clusters will be loaded as strings.
New word clusters, introduced in snipsco/snips-nlu-language-resources#33, will benefit from this improved implementation, as all clusters are u16-like.

…mplementations

ClemDoum

👏

adrienball added 7 commits September 5, 2019 18:38

Load hierarchical word clusters more efficiently

c9740f1

Remove hierarchical aspect of word clusters

37b5f0a

Merge branch 'master' into task/optim-word-clusters

24d8b32

Use hashing trick to replace string by i32 in Gazetteer and Stemmer i…

91f5ba3

…mplementations

Make word clusterer implementation compatible with non-u16 clusters

db36c13

Fix docstring

af98274

Add small improvements

d1e2984

adrienball requested review from fredszaq and ClemDoum September 9, 2019 16:34

ClemDoum approved these changes Sep 10, 2019

View reviewed changes

fredszaq approved these changes Sep 10, 2019

View reviewed changes

Update Changelog

2bda43c

adrienball merged commit 75f1258 into master Sep 10, 2019

adrienball deleted the task/optimize-resources-loading branch September 10, 2019 10:06

adrienball mentioned this pull request Sep 12, 2019

Release 0.65.3 #153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize memory footprint of resources #151

Optimize memory footprint of resources #151

adrienball commented Sep 9, 2019

ClemDoum left a comment

Optimize memory footprint of resources #151

Optimize memory footprint of resources #151

Conversation

adrienball commented Sep 9, 2019

ClemDoum left a comment

Choose a reason for hiding this comment