Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak in _cdecimal.c with 'z' format #114563

Closed
ericvsmith opened this issue Jan 25, 2024 · 12 comments · Fixed by #114879
Closed

Possible memory leak in _cdecimal.c with 'z' format #114563

ericvsmith opened this issue Jan 25, 2024 · 12 comments · Fixed by #114879
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@ericvsmith
Copy link
Member

ericvsmith commented Jan 25, 2024

Bug report

Bug description:

See https://mail.python.org/archives/list/python-announce-list@python.org/thread/DHZROL7YYJZTPWJQ3WME4HI3Z65K2H4F/

This feature was originally implemented in #30049

I have not verified the memory leak or looked at Stefan's suggestions.

@mdickinson @belm0

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

@ericvsmith ericvsmith added the type-bug An unexpected behavior, bug, or error label Jan 25, 2024
@Eclips4 Eclips4 added the extension-modules C modules in the Modules dir label Jan 25, 2024
@belm0
Copy link
Contributor

belm0 commented Jan 25, 2024

Inline quote below. Was there a thread specifically about the memory leak?

Haven't looked at it, but seems to be good news that the mpdecimal library picked up z-format support.

Additionally, distributors of Python-11 and Python-12 are advised to revert
the implementation of the z-format specifier in _decimal.c. It contains a
memory leak for large decimals and does not support the "EG" types.

mpdecimal-4.0.0 automatically supports the z-format specifier without patches
to _decimal.c.

The following patch cleanly reverts b0b836b
and applies to both Python-11 and Python-12:

https://www.bytereef.org/contrib/0001-py12-revert-z-format-specifier.patch

@JelleZijlstra
Copy link
Member

@belm0 in case you're not aware, there's some history here (which I won't get into) that may make the author of that announcement hesitant to interact directly with CPython. If we can verify the memory leak exists, we should fix it in CPython.

@ericvsmith
Copy link
Member Author

Thanks for pointing that out, @JelleZijlstra. I agree we should just try and fix it on our side. I don't know how feasible it is to update to a new mpdecimal and remove our 'z' code, if that will fix it.

@belm0
Copy link
Contributor

belm0 commented Jan 25, 2024

I recall debating defending that memory handling code in the PEP PR. I wonder if the former Python contributor (having maintained that integration code) suspected a leak by reading the code, or actually observed a leak.

@mdickinson
Copy link
Member

I can reproduce the memory leak on main with this code:

from decimal import Decimal

while True:
    d = Decimal('-.508e+41211')
    format(d, 'D=-z,.44%')

When I run this on my (macOS / Intel, FWIW) machine, RAM usage (observed with top) grows at around 80 MB per second. If I remove the 'z' from the format specifier, I no longer see a leak.

tracemalloc confirms the leak: if I run:

import decimal
import tracemalloc
tracemalloc.start()
repeat = 100

before = tracemalloc.take_snapshot()
for _ in range(repeat):
    d = decimal.Decimal('-.508e+41211')
    format(d, 'D=-z,.44%')
after = tracemalloc.take_snapshot()
top_stats = after.compare_to(before, 'lineno')
for stat in top_stats[:3]:
    print(stat)

then the first line of the output is:

/Users/mdickinson/Repositories/python/cpython/test.py:9: size=1697 KiB (+1697 KiB), count=100 (+100), average=17.0 KiB

and the size grows linearly with repeat.

@belm0 Can you reproduce the above on your machine, and if so do you have bandwidth to investigate further?

does not support the "EG" types

I don't understand / can't reproduce this: the E and G types will only ever produce a zero output (with whatever sign) for a zero input, and in the zero-input case the z flag appears to be working as expected. E.g., for the E case:

mdickinson@lovelace cpython % ./python.exe
Python 3.13.0a3+ (heads/main:a768e12f09, Jan 28 2024, 09:45:39) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from decimal import Decimal
>>> format(Decimal('-0'), '.6E')
'-0.000000E+6'
>>> format(Decimal('-0'), 'z.6E')
'0.000000E+6'

@skrah
Copy link
Contributor

skrah commented Jan 28, 2024

  • mpdecimal is documented to follow PEP-3101:

https://www.bytereef.org/mpdecimal/doc/libmpdec/assign-convert.html#to-string

  • The memory leak for static decimals that are promoted to dynamic ones if the coefficient gets too large is documented here:

https://www.bytereef.org/mpdecimal/doc/libmpdec/memory.html

  • You have to call mpd_del also on static decimals (unless they are marked constant).

  • The patches in the mail cited in the first message contain a general fallback that catches all specifiers that _pydecimal supports. Just apply:

https://www.bytereef.org/contrib/0001-main-revert-z-format-specifier.patch
https://www.bytereef.org/contrib/0002-main-fallback-to-pydecimal-format.patch

  • There is no need to upgrade to mpdecimal-4.0.0. mpdecimal is a library, distributions ship it and it is trivial to provide a nuget package (like the Windows build does for other libraries).

Finally, @JelleZijlstra, prevailing open source conventions would dictate that upstream is contacted when a new feature is desired in a library, not the other way round.

@skrah
Copy link
Contributor

skrah commented Jan 28, 2024

@mdickinson Thank you for reproducing the memory leak! "EG" was a typo (introduced while coordinating the large amount of patches) I meant "F":

>>> from _decimal import *
>>> Decimal("-6.24E-323").__format__("\U000d3219> z3,.10F")
'-0.0000000000'
>>> 
>>> from _pydecimal import *
>>> Decimal("-6.24E-323").__format__("\U000d3219> z3,.10F")
' 0.0000000000'

@skrah
Copy link
Contributor

skrah commented Jan 28, 2024

@belm0 I did not "maintain" the integration code, I am the sole author of Modules/_decimal/*, including mpdecimal.

@belm0
Copy link
Contributor

belm0 commented Jan 28, 2024

Thank you for the additional links.

prevailing open source conventions would dictate that upstream is contacted when a new feature is desired in a library, not the other way round

I neglected to consider this precisely because distributions may ship/override mpdecimal, so putting an implementation in Python seemed the only way to provide the enhancement for all cases.

@mdickinson what do you think of the fallback approach in the cited patch? It seems like less tricky code to maintain.

>>> Decimal("-6.24E-323").__format__("\U000d3219> z3,.10F")
'-0.0000000000'

Do we need a separate issue for supporting F?

@skrah
Copy link
Contributor

skrah commented Jan 28, 2024

prevailing open source conventions would dictate that upstream is contacted when a new feature is desired in a library, not the other way round

I neglected to consider this precisely because distributions may ship/override mpdecimal, so putting an implementation in Python seemed the only way to provide the enhancement for all cases.

Yes, I agree, up to a point. For example, all relevant distributions are now >= 2.5.1:

https://repology.org/project/mpdecimal/versions

So now instances of CONFIG_64 in _decimal.c could be replaced by MPD_CONFIG_64. This is unrelated to this issue, just an example that workaround code can be deleted at some point. It will take a year for them to pick up 4.0.0 of course!

I'm very glad that you mention the difficulties of the coordination issue! I have received very little empathy in 2020 when I tried to fix an incorrect (and unreleased!) patched libmpdec in and for Debian before the freeze.

Do we need a separate issue for supporting F?

No, the fallback code fixes everything including hash formatting. mpdecimal speedups will be picked up automatically when available.

@belm0
Copy link
Contributor

belm0 commented Feb 1, 2024

I opened PR #114879 based on Stefan's patches.

@mdickinson
Copy link
Member

@belm0

what do you think of the fallback approach in the cited patch?

Seems reasonable to me: it makes a lot of sense to me to separate the (somewhat) frequently-changing Python-specific formatting details from the standards-based decimal core. It's a bit ugly to have some of the formatting being done in the C code and some in Python, but that seems like the most pragmatic compromise. (Moving all formatting to Python sounds nice in theory, but would almost certainly have an unacceptable performance impact for common cases.)

Thanks for the PR. I'll review shortly.

serhiy-storchaka pushed a commit that referenced this issue Feb 12, 2024
…trings (GH-114879)

Immediate merits:
* eliminate complex workarounds for 'z' format support
  (NOTE: mpdecimal recently added 'z' support, so this becomes
  efficient in the long term.)
* fix 'z' format memory leak
* fix 'z' format applied to 'F'
* fix missing '#' format support

Suggested and prototyped by Stefan Krah.

Fixes gh-114563, gh-91060

Co-authored-by: Stefan Krah <skrah@bytereef.org>
serhiy-storchaka pushed a commit that referenced this issue Feb 12, 2024
…ormat strings (GH-114879) (GH-115353)

Immediate merits:
* eliminate complex workarounds for 'z' format support
  (NOTE: mpdecimal recently added 'z' support, so this becomes
  efficient in the long term.)
* fix 'z' format memory leak
* fix 'z' format applied to 'F'
* fix missing '#' format support

Suggested and prototyped by Stefan Krah.

Fixes gh-114563, gh-91060

(cherry picked from commit 72340d1)

Co-authored-by: John Belmonte <john@neggie.net>
Co-authored-by: Stefan Krah <skrah@bytereef.org>
belm0 added a commit to belm0/cpython that referenced this issue Feb 13, 2024
…unsupported format strings (pythonGH-114879) (pythonGH-115353)

Immediate merits:
* eliminate complex workarounds for 'z' format support
  (NOTE: mpdecimal recently added 'z' support, so this becomes
  efficient in the long term.)
* fix 'z' format memory leak
* fix 'z' format applied to 'F'
* fix missing 'GH-' format support

Suggested and prototyped by Stefan Krah.

Fixes pythongh-114563, pythongh-91060

(cherry picked from commit 72340d1)

(cherry picked from commit 09c98e4)

Co-authored-by: John Belmonte <john@neggie.net>
Co-authored-by: Stefan Krah <skrah@bytereef.org>
serhiy-storchaka pushed a commit that referenced this issue Feb 13, 2024
…ormat strings (GH-114879) (GH-115384)

Immediate merits:
* eliminate complex workarounds for 'z' format support
  (NOTE: mpdecimal recently added 'z' support, so this becomes
  efficient in the long term.)
* fix 'z' format memory leak
* fix 'z' format applied to 'F'
* fix missing 'GH-' format support

Suggested and prototyped by Stefan Krah.

Fixes gh-114563, gh-91060

(cherry picked from commit 72340d1)
(cherry picked from commit 09c98e4)

Co-authored-by: Stefan Krah <skrah@bytereef.org>
fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this issue Feb 14, 2024
…rmat strings (pythonGH-114879)

Immediate merits:
* eliminate complex workarounds for 'z' format support
  (NOTE: mpdecimal recently added 'z' support, so this becomes
  efficient in the long term.)
* fix 'z' format memory leak
* fix 'z' format applied to 'F'
* fix missing '#' format support

Suggested and prototyped by Stefan Krah.

Fixes pythongh-114563, pythongh-91060

Co-authored-by: Stefan Krah <skrah@bytereef.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants