Skip to content

Fix memory leak in _cbson._element_to_dict #316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

messa
Copy link
Contributor

@messa messa commented Oct 20, 2016

The memory leak happens when I use document_class=RawBSONDocument, because its __inflated property calls bson._iterate_elements that in turn calls _element_to_dict. The memory leak is present only in the C extension; if it is disabled everything becomes OK.

When the default document_class is used (i.e. dict) there is no memory leak, because that uses different _cbson functions where the ref counts are handled properly (_elements_to_dict in _cbsonmodule.c).

@behackett
Copy link
Member

Thanks for the bug report and patch, I've done a bit of triage and opened https://jira.mongodb.org/browse/PYTHON-1171.

@behackett
Copy link
Member

@messa, can you change your PR to switch from OO to NN in the Py_BuildValue call instead of moving the Py_DECREF calls? That should be slightly more efficient, avoiding two calls to Py_INCREF and two calls to Py_DECREF in the common case.

Also, I'm curious why you are inflating the raw documents?

@messa messa force-pushed the messa_fix_decref_after_tuple branch from 4f57152 to 0c70230 Compare October 20, 2016 21:44
@messa
Copy link
Contributor Author

messa commented Oct 20, 2016

Changed and force-pushed (hope that's OK; previous version here).

I'm not inflating the raw documents myself. The low-level cursor batch document is inflated when the result documents are retrieved from it:

documents = cursor['firstBatch']

It's just a "shallow inflate" but it copies all the document contents so when memory leaked it eats memory pretty fast.

@messa
Copy link
Contributor Author

messa commented Oct 20, 2016

Btw. my use case for RawBSONDocument:

  • right now I've just needed a quick pre-heat script that reads all documents in a collection, without doing anything with them. I just tried raw documents to compare how much faster such script will be so I discovered this issue.
  • we have some data processing component implemented in Python with the core parts in C++ via Cython. I'm thinking about using pymongo raw documents so I can move document loading from Python to pure C++ (with C/C++ bson lib) so the Python part just transfers byte blobs from pymongo to Cython/C++ code.

@behackett
Copy link
Member

I see, we leak two references for each cursor batch regardless of the application explicitly inflating the individual documents. Sigh. Sadly, coverity doesn't appear to understand Python's reference counting at all and I completely missed this in code review back in 3.2.

Our thinking for this feature was to allow various optional BSON decoders, possibly written in languages other than Python (but with slim python wrappers). python-bsonjs was the first project of that type we wrote. Your use case is very similar.

@behackett
Copy link
Member

I've confirmed the problem, and that your patch fixes it. Patch merged to master. Thanks again!

@behackett behackett closed this Oct 20, 2016
@behackett
Copy link
Member

We've release PyMongo 3.3.1, with your fix for this issue.

https://pypi.python.org/pypi/pymongo/3.3.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants