Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reindex memory leaks #2070

Merged
merged 3 commits into from Mar 19, 2019
Merged

Fix reindex memory leaks #2070

merged 3 commits into from Mar 19, 2019

Conversation

noirbizarre
Copy link
Contributor

@noirbizarre noirbizarre commented Mar 19, 2019

This PR fixes 2 memory leak on search indexing.

The memory profile have been sampled with the following configuration:

  • CPU: Inter Core I7 7500U
  • RAM: 16G
  • OS: Archlinux
  • Datasets: 10000

Tooling

The profiling script is: https://gist.github.com/noirbizarre/7ea0b0526814131752e6902a4254df78

It requires:

Run it with:

mprof run -o output.dat profiling.py

Render graph with:

mprof plot output.dat -t "Title" -o output.png

Memory profile from current master

profiling-master

Max memory usage: ~4G

udata SlugField

The udata SlugField instances were keeping a reference to their parent document using self.instance. This reference was stored in a class instance which is instanciated once on document definition and so the garbage collector was never able to collect document instances using a slug field (as there was always a reference on it)

This has been fixed by refactoring the SlugField population into a signal handler and not keeping a static feference on the instance.

Memory profile with the SlugField fix

profiling-fix

Max memory usage: ~3.6G

MongoEngine querysets and no_cache()

MongoEngine default behavior is to cache any queryset just to avoid round trip on "count then query".
This behavior cause a massive memory leak while iterating over a queryset generator.

The fix is to add a no_cache() call before the loop.

This fix might be beneficial on other code parts.

Memory profile with no_cache()

profiling-fix-no-cache

Max memory usage: ~300M

MongoEngine 0.17.0

It doesn't fix the memory leak but the update was applied just to verify that it has no impact (See this PR).
As you can see, there is some meory drop and the hole reindexing take much more longer.
The update has been removed from the PR until the cause of these drops is found.

Memory profile

profiling-fix-no-cache-0 17 0

Max memory usage: ~300M

@noirbizarre noirbizarre requested a review from a team March 19, 2019 10:44
@noirbizarre noirbizarre added this to the 1.6.6 milestone Mar 19, 2019
udata/models/slug_fields.py Outdated Show resolved Hide resolved
@noirbizarre noirbizarre merged commit fde77c3 into opendatateam:master Mar 19, 2019
@noirbizarre noirbizarre deleted the memory-leaks branch March 19, 2019 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants