Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PyMem_RawMalloc on Python 3.4 and newer #7404

Merged
merged 1 commit into from
Apr 21, 2016
Merged

Use PyMem_RawMalloc on Python 3.4 and newer #7404

merged 1 commit into from
Apr 21, 2016

Conversation

vstinner
Copy link
Contributor

Change PyArray_malloc() macro to use PyMem_RawMalloc() on Python 3.4
and newer. This macro can be called indirectly from ufunc_at() which
releases the GIL, whereas PyMem_Malloc() requires the GIL to be held:
https://docs.python.org/dev/c-api/memory.html#memory-interface

PyMem_RawMalloc() can be called without the GIL.

@vstinner
Copy link
Contributor Author

Extract of the gdb traceback where PyMem_Malloc() is currently called with the GIL released:

PyMem_Malloc (size=8) at Objects/obmalloc.c:387
npyiter_allocate_buffers (iter=0xcb0280, errmsg=0x7ffffffe37a0) at numpy/core/src/multiarray/nditer_api.c:1738
NpyIter_ResetBasePointers (iter=0xcb0280, baseptrs=0x7ffffffe3780, errmsg=0x7ffffffe37a0) at numpy/core/src/multiarray/nditer_api.c:311
ufunc_at (ufunc=0x7fffeaad6220, args=(<numpy.ndarray at remote 0x7fffe876a440>, [1, 2, 3, 2], <float at remote 0x7fffe87302c8>)) at numpy/core/src/umath/ufunc_object.c:5366

@vstinner
Copy link
Contributor Author

PyMem_Malloc() is called in 3 other places:

numpy/f2py/src/fortranobject.c:159:    buf = p = (char *)PyMem_Malloc(size);
numpy/random/mtrand/Python.pxi:22:    void* PyMem_Malloc(size_t n)
numpy/random/mtrand/mtrand.pyx:916:        self.internal_state = <rk_state*>PyMem_Malloc(sizeof(rk_state))

fortranobject.c: It looks like fortran_doc() is called with the GIL held. (PyMem_Free is also called with the GIL held in this file.)

numpy/random/mtrand/mtrand.pyx: It looks like PyMem_Malloc() is also called with the GIL held, but I don't know enough Cython to double check :-/ (Same for PyMem_Free(), it also looks to be correctly used.)

@pv
Copy link
Member

pv commented Mar 10, 2016

It appears this does not fix the issue for Python <= 3.3 (in particular 2.7)?

@vstinner
Copy link
Contributor Author

It appears this does not fix the issue for Python <= 3.3 (in particular 2.7)?

In practice, PyMem_Malloc() can be called without the GIL being held until Python 3.6. But you may have issues if you play with the tracemalloc debugger or other debug tools.

It may change in Python 3.6 with my patch:
http://bugs.python.org/issue26249

With my patch, calling PyMem_Malloc() without the GIL being held can fail badly.

@seberg
Copy link
Member

seberg commented Mar 10, 2016

Do we do this normally? ufunc.at is not in the best state unfortunately, I would be surprised if we really call PyMem_Malloc without the GIL in many places, though wouldn't rule it out. If it is always safe with this patch, I guess we do not have to worry about it though.

If it can fail badly with the new behaviour, isn't there some reason why we should try to prefer using the new behaviour as well? Holding the GIL a little longer in ufunc.at is probably not an issue as well. Just curious.

@charris
Copy link
Member

charris commented Mar 11, 2016

@seberg Holding the GIL seems reasonable to me unless we need to reacquire it someplace. There are some places where memory is acquired without the GIL, but I think we use malloc in those places. @Haypo What is the advantage of using PyMem_RawMalloc()?

@charris
Copy link
Member

charris commented Mar 11, 2016

I don't see anything wrong with the patch, just asking what the best thing for us to do here is.

@vstinner
Copy link
Contributor Author

"There are some places where memory is acquired without the GIL, but I think we use malloc in those places. @Haypo What is the advantage of using PyMem_RawMalloc()?"

PyMem_RawMalloc() calls malloc() in practice. The advantage of calling PyMem_RawMalloc() instead of calling directly malloc() is that you can use tracemalloc to track memory leaks:
https://docs.python.org/dev/library/tracemalloc.html

The overhead of calling of PyMem_RawMalloc() instead of malloc() is negligible:
https://www.python.org/dev/peps/pep-0445/#performances

By the way, I'm working on an extension of the tracemalloc module for numpy:
http://bugs.python.org/issue26530

@charris
Copy link
Member

charris commented Mar 11, 2016

@Haypo That sounds good. I guess the question come down to whether it is better to fix the places where PyMem_Malloc() is called without the GIL or simply use PyMem_RawMalloc() instead. Is there any advantage to retaining PyMem_Malloc()?

@vstinner
Copy link
Contributor Author

@Haypo That sounds good. I guess the question come down to whether it is better to fix the places where PyMem_Malloc() is called without the GIL or simply use PyMem_RawMalloc() instead.

How do you plan to "fix" the code? It's not trivial to lock/unlock the GIL. If you know that in some cases, the GIL can be released, always use PyMem_RawMalloc().

You might also replace direct calls to malloc() with PyMem_RawMalloc() (on Python 3.4+) to be able to track all memory allocated by numpy using tracemalloc.

@pv
Copy link
Member

pv commented Mar 11, 2016

You can use the PyGILState_* functions to reacquire GIL. Of course, this has some performance penalty (and I don't remember whether it bombs subinterpreters, but we don't support those in any case). But it may be better to just use malloc or PyMem_RawMalloc instead.

@vstinner
Copy link
Contributor Author

You can use the PyGILState_* functions to reacquire GIL. Of course, this has some performance penalty (and I don't remember whether it bombs subinterpreters, but we don't support those in any case).

Yeah, it's likely to kill performances :-)

But it may be better to just use malloc or PyMem_RawMalloc instead.

In this case, I suggest to use PyMem_RawMalloc on Python 3.4+ or malloc() on older Python version.

I see that numpy has a PY_USE_PYMEM define (you can see it in my change).

@charris
Copy link
Member

charris commented Mar 11, 2016

Let me rephrase the question: is there an advantage to using PyMem_Malloc() when the GIL is not a problem?

@vstinner
Copy link
Contributor Author

Let me rephrase the question: is there an advantage to using PyMem_Malloc() when the GIL is not a problem?

For debuggers like tracemalloc, the tracemalloc doesn't have to acquire the GIL, so it's faster.

If my change https://bugs.python.org/issue26249 is accepted, PyMem_Malloc() will be faster than PyMem_RawMalloc() (faster than malloc()?) for allocations smaller or equal than 512 bytes.

@charris
Copy link
Member

charris commented Mar 11, 2016

@Haypo Thanks for the clarification. I ask because, IIRC, we started using more PyMem_Malloc to take advantage of the memory pool for small objects. OTOH, 512 bytes = 8 doubles, which is not a very large array. @juliantaylor Thoughts?

@vstinner
Copy link
Contributor Author

we started using more PyMem_Malloc to take advantage of the memory pool for small objects.

Do you mean the pymalloc memory allocator optimized for small objects? PyMem_Malloc() doesn't use pymalloc, only PyObject_Malloc().

My change https://bugs.python.org/issue26249 proposes to modify PyMem_Malloc() to use pymalloc too.

@juliantaylor
Copy link
Contributor

PyArray_malloc should not be used anywhere where it is really relevant, we use PyObject_malloc for objects and our own small data allocator for small arrays.
Do you have a real case where it makes a performance difference? ufunc_at is interesting though its so unoptimized malloc overhead should not even matter.

We also have our own variant of tracemalloc so thats not so relevant either. It has been discussed in the past to match that with pythons but pythons api is not sufficiently flexible for that.

@vstinner
Copy link
Contributor Author

We also have our own variant of tracemalloc so thats not so relevant either. It has been discussed in the past to match that with pythons but pythons api is not sufficiently flexible for that.

Did you see my recent issue #26530 that I wrote for numpy?

I wasn't aware that numpy has its own tool to track memory allocations. Interesting.

@njsmith
Copy link
Member

njsmith commented Mar 12, 2016

I wasn't aware that numpy has its own tool to track memory allocations. Interesting.

Well, it has its own interface for hooking the memory allocation to track it. AFAIK no-one actually uses these hooks though, so tracemalloc is interesting because it lets us free-ride on the larger python ecosystem :-)

@charris
Copy link
Member

charris commented Mar 12, 2016

lets us free-ride on the larger python ecosystem :-)

Just so. There is a definite advantage to using standard tools that people will be more familiar with and probably expect.

@charris
Copy link
Member

charris commented Mar 12, 2016

Could use a mention in the doc/release/1.12.0-notes.rst if this goes in.

@charris charris added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Mar 12, 2016
@arigo
Copy link

arigo commented Mar 15, 2016

@charris: 512 bytes is 64 doubles, not 8.

@charris charris added this to the 1.12.0 release milestone Apr 16, 2016
@vstinner
Copy link
Contributor Author

My pull request fixes support for Python 3.6. Can it be merged?

I suggest to defer changes to hold the GIL and to call PyMem_Malloc() rather than PyMem_RawMalloc(). PyMem_Malloc() and PyMem_RawMalloc() use exactly the same memory allocator in Python 3.4 and 3.5. My change to optimize the PyMem_Malloc() is not merged in CPython 3.6 yet, so it still uses the same allocator.

@njsmith
Copy link
Member

njsmith commented Apr 20, 2016

@Haypo: sure, it sounds like maybe we should think about revisiting how PyArray_malloc gets used more generally, but that can wait for later.

If PyMem_Malloc and PyMem_RawMalloc are the same in 3.4 and earlier, then why does your patch have an #ifdef to only use RawMalloc on 3.5, as opposed to just switching unconditionally? (We do try to reduce the number of #ifdefs when possible...)

@vstinner
Copy link
Contributor Author

If PyMem_Malloc and PyMem_RawMalloc are the same in 3.4 and earlier, then why does your patch have an #ifdef to only use RawMalloc on 3.5

My patch uses PyMem_RawMalloc on Python >= 3.4. Extract of the patch:

+#  if PY_VERSION_HEX >= 0x03040000

It cannot be used on older Python versions since the function was introduced in Python 3.4 ;-)
https://docs.python.org/dev/c-api/memory.html#raw-memory-interface

@njsmith
Copy link
Member

njsmith commented Apr 20, 2016

ah, right, silly me :-)

Can you expand the comment a bit, to note that (a) numpy sometimes calls PyArray_malloc with the GIL released, (b) that's only technically legal with RawMalloc, which (c) was introduced in 3.4, and (d) earlier versions it's technically illegal but turns out to be safe in practice? Will save someone doing a bit of archeology later...

Change PyArray_malloc() macro to use PyMem_RawMalloc() on Python 3.4
and newer. This macro can be called indirectly from ufunc_at() which
releases the GIL, whereas PyMem_Malloc() requires the GIL to be held:
https://docs.python.org/dev/c-api/memory.html#memory-interface

PyMem_RawMalloc() can be called without the GIL.
@vstinner
Copy link
Contributor Author

Can you expand the comment a bit

Done. Python 3.6 has a new PYTHONMALLOC=debug env variable which will now fail with a fatal error in PyMem_Malloc() if the function is called without holding the GIL.
https://docs.python.org/dev/whatsnew/3.6.html#pythonmalloc-environment-variable

And I'm now more confident to push my change modifying PyMem_Malloc() to reuse PyObject_Malloc() allocator, and so get the fast allocator for memory blocks <= 512 bytes.

@vstinner
Copy link
Contributor Author

@njsmith: By the way, what's the status of tracemalloc support in numpy? It's not the first time that someone joined #python-fr help channel and had a question about the memory consumption of an application using numpy.

@vstinner
Copy link
Contributor Author

@njsmith: Oh, I now recall that I was waiting for your feedback on https://bugs.python.org/issue26530 to be able to track more memory allocations in numpy!

@njsmith
Copy link
Member

njsmith commented Apr 21, 2016

LGTM, thanks -- and sorry about the slowness

@njsmith njsmith merged commit 631e989 into numpy:master Apr 21, 2016
@vstinner
Copy link
Contributor Author

Thank you.

You didn't reply to my tracemalloc questions ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants