Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mmap model shared between process , memory use is not stable #2883

Closed
raphaelauv opened this issue Jul 14, 2020 · 5 comments
Closed

Mmap model shared between process , memory use is not stable #2883

raphaelauv opened this issue Jul 14, 2020 · 5 comments

Comments

@raphaelauv
Copy link

raphaelauv commented Jul 14, 2020

Problem description

I'm exposing a word2vec model with Gunicorn sync workers.

(and I run this gunicorn process inside one docker container )

With 1 worker there is no problem. But with multiples workers ( forks from the main gunicorn process and with preload activated ) :

  • mmap work well at statup , the memory use is almost the same than if I'm using only one process.

  • But when I start to query the API ( each call execute a indexer.model.wv.most_similar ) with a loadtest program, the memory usage grow until it stabilize.

This is with 8 sync workers (memory is growing):
Screenshot from 2020-07-15 01-25-44

with only one worker the lines stay perfectly flat :
Screenshot from 2020-07-15 01-43-00

Steps/code/corpus to reproduce

Load a word2vec Gensim model with mmap activated and fork N times , then query the model in each subprocesses.

The problem is already in stackoverflow , but no solution proposed :

https://stackoverflow.com/questions/51616074/sharing-memory-for-gensims-keyedvectors-objects-between-docker-containers/51722607#51722607

What I tried :

This is the loading code of the model

from gensim.similarities.index import AnnoyIndexer
from gensim.models import KeyedVectors
from gensim.similarities import index as gensim_index
from my_project.patch_annoy import annoy_load

gensim_index.AnnoyIndexer.load = annoy_load # monkeypatch to put self.index.load(fname, prefault=True)

indexer = AnnoyIndexer()
indexer.load(LOCAL_MODEL_PATH)
wv = KeyedVectors.load(LOCAL_VECTORS_NORMED, mmap='r')
indexer.model = wv

Also i get this message when I load the previously normalized ( init_sims(replace=True) ) model is :

setting ignored attribute vectors_norm to None

Versions

Linux-5.0.0-38-generic-x86_64-with-Ubuntu-19.04-disco
Python 3.7.3 (default, Oct  7 2019, 12:56:13) 
[GCC 8.3.0]
NumPy 1.17.1
SciPy 1.4.1
gensim 3.8.3
FAST_VERSION 1
@gojomo
Copy link
Collaborator

gojomo commented Jul 15, 2020

As this is likely some behavior specific to your use, and not necessarily a bug in gensim, it'd be better to discuss at the discussion list, unless/until some bug/feature-request emerges.

But, a few thoughts:

  • if you've already pre-normalized the loaded set of vectors, and then after a mmap='r'load (from a not-compressed save) you've done that assignment into .vectors_norm, then actual KeyedVectors.most_similar() operations won't allocate a new cached array, and indeed any number of lookups should only cause temporary memory usage. And other processes should share the same mmapped memory (though there will still be some extra overhead for each process's word->data lookup dict).
  • But, you're using the AnnoyIndexer, and I'm less familiar with its memory usage patterns under traffic. I don't see any allowance in its load() to share an mmapped file like KeyedVectors supports. (And, you're patching it with extra project-specific code – I don't see 'prefault' in the AnnoyIndexer code so don't know what your patch is doing.) Are you sure you need to use AnnoyIndexer? It adds overhead - and how much performance benefit are you seeing versus direct full-scan queries? Even if you think it necessary – do you see the same memory behavior if you leave AnnoyIndexer out entirely? (It may be the main source of the extra-memory-usage-per-process.)
  • A lot will depend on the gunicorn mode. I haven't looked at it in ages, but if it must fork processes, it might be better to defer any word-vector loading until after a process is forked. (Even though immediately after the right kinds of process fork, the processes might share relevant only-read memory, I'm not sure further operations, including GC object relocations, would necessarily preserve that.)

@piskvorky
Copy link
Owner

piskvorky commented Jul 15, 2020

Another lead is just plain memory fragmentation from Python. The large numpy array will only live in RAM once (assuming nothing went wrong with mmap='r', as per @gojomo 's comment), but then there's a number of other data structures in KeyedVectors that will be specific to each process: key dictionaries and the like.

So even if you managed to fork processes after loading the object, so that you have just one KeyedVector copy in RAM thanks to Linux's copy-on-write, Python will gradually force separate page copies in each process because it touches the objects during reference counting.

TL;DR: Make sure you're actually mmaping, so the big array is in RAM just once. Then check where the extra memory is creeping in from. You can use my smaps.py script, for example.

@raphaelauv
Copy link
Author

raphaelauv commented Sep 10, 2020

Hi thanks for your answers , yes the keyedVector is loaded before to fork

I tried to use gc.freeze() after the preload in gunicorn but it's not better.
benoitc/gunicorn#1640 (comment)

[2020-09-10 09:08:28,409] INFO in serve: Starting serving predict
[2020-09-10 09:08:31,831] INFO in utils: loading Word2VecKeyedVectors object from /opt/ml/model/keyedvector
[2020-09-10 09:08:31,959] INFO in utils: setting ignored attribute vectors_norm to None
[2020-09-10 09:08:31,959] INFO in utils: loaded /opt/ml/model/keyedvector
[2020-09-10 09:08:31,959] INFO in keyedvectors: precomputing L2-norms of word weight vectors
[2020-09-10 09:08:32,058] INFO in predictor: Keyed Vectors loaded.
[2020-09-10 09:08:32,059] INFO in predictor: Loading Annoy Index...
[2020-09-10 09:08:32,059] INFO in patch_annoy: Custom Annoy loader : prefault=True
[2020-09-10 09:08:32,063] INFO in predictor: Annoy Index loaded.
[2020-09-10 09:08:32,105] INFO in predictor: Warm index
[2020-09-10 09:08:34 +0000] [46] [INFO] Starting gunicorn 20.0.4
[2020-09-10 09:08:34 +0000] [46] [INFO] Listening at: unix:/tmp/gunicorn.sock (46)
[2020-09-10 09:08:34 +0000] [46] [INFO] Using worker: sync
Objects frozen in perm gen:  165434
[2020-09-10 09:08:34 +0000] [56] [INFO] Booting worker with pid: 56
[2020-09-10 09:08:34 +0000] [57] [INFO] Booting worker with pid: 57
[2020-09-10 09:08:34 +0000] [58] [INFO] Booting worker with pid: 58
[2020-09-10 09:08:34 +0000] [59] [INFO] Booting worker with pid: 59

@gojomo
Copy link
Collaborator

gojomo commented Sep 11, 2020

As noted in my response, you might get more-assured mmap-sharing by loading the KeyedVectors into each separate process.

It's hard to understand your logging without more of your full code - what code is loading the vectors & index that's not wholly-before the gunicorn startup? How are the worker processes after "booting" sharing that earlier work?

The way the increase happens after loading, gradually over use, means it could easily be other code (including your unshown "monkey patch" to the AnnoyIndexer) that's responsible. Or, caused by the AnnoyIndexer that might not even be strictly necessary. So again: do you see the same memory behavior if you leave AnnoyIndexer out entirely?

@raphaelauv
Copy link
Author

@gojomo again big thank for your help.

So I tried without the AnnoyIndexer and it's working ! There is no memory leak when multiprocessing with only gensim.

Sorry for my wrong interpretations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants