Set a LRU cache for word embeddings for a decrease of 20% of inference time #1084

pommedeterresautee · 2019-09-10T22:32:39Z

Set an embedding LRU cache of Tensor of word embeddings.
This approach avoid to convert most used Gensim embeddings to Pytorch Tensor and even more important to avoid to transfer from computer RAM to GPU Ram (this is a slow operation).
Because of zipf law, the effect of such caching approach are magnified.
LRU cache is set to 10000 because it s very small and still provide most of the performance boost compared to loading all embeddings in GPU Ram. A 1000 embedding LRU cache is slightly less performant on my own dataset.

A second little optimization is to replace a call of unsqueeze on each token followed by a cat by a single call of stack.

Time to process 100 French documents decreased from 33s to 26s with this PR.
For what it worths, Spacy takes exactly the same time (26s) on my dataset with much lower accuracy on some tricky entities (and same accuracy on easiest to recognize entities).
Another change is on Ner HTML viewer with the introduction of a HTML title parameter.

# Conflicts: # flair/data.py

flair/data.py

alanakbik · 2019-09-11T10:59:57Z

This is really great, including this PR the combined improvements of your recent PRs take inference time from ~95 seconds on CoNLL-03 down to ~33 seconds - Thanks for all your great work!

alanakbik · 2019-09-11T11:00:07Z

👍

kashif · 2019-09-11T12:38:24Z

👍

pommedeterresautee added 9 commits September 8, 2019 10:26

add cache on embeddings (conversion + cuda)

3692d85

Merge branch 'ner' into cache_results

2824ca6

replace unsqueeze + cat by stack call

0d2d4c0

simplify tensor creation code

d8b23d0

little optimization

c3bc50a

add title to viewer

7fcd21c

Merge remote-tracking branch 'upstream/master' into cache_results

f5e8b82

# Conflicts: # flair/data.py

fix

1421752

apply black

c760a93

alanakbik reviewed Sep 11, 2019

View reviewed changes

flair/data.py Show resolved Hide resolved

alanakbik merged commit 67aeb04 into flairNLP:master Sep 11, 2019

pommedeterresautee deleted the cache_results branch September 11, 2019 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set a LRU cache for word embeddings for a decrease of 20% of inference time #1084

Set a LRU cache for word embeddings for a decrease of 20% of inference time #1084

pommedeterresautee commented Sep 10, 2019 •

edited

alanakbik commented Sep 11, 2019

alanakbik commented Sep 11, 2019

kashif commented Sep 11, 2019

Set a LRU cache for word embeddings for a decrease of 20% of inference time #1084

Set a LRU cache for word embeddings for a decrease of 20% of inference time #1084

Conversation

pommedeterresautee commented Sep 10, 2019 • edited

alanakbik commented Sep 11, 2019

alanakbik commented Sep 11, 2019

kashif commented Sep 11, 2019

pommedeterresautee commented Sep 10, 2019 •

edited