-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyLong: use GMP #66121
Comments
I have implemented the PyLong interface using the GMP mpn functions. API/ABI compatibility is retained (except for longintrepr). It can be enabled by passing --enable-big-digits=gmp to ./configure. No large performance regressions have been observed for small numbers (a few operations are about 10% slower). For large numbers some operations are a lot faster. There is also int.__gcd__ which may be used by fractions.gcd. The GIL is sometimes released. Minimum number of digis for releasing GIL:
The tests for long, float, decimal, fractions, string, unicode, bytes, pickle, marshal and enum pass. The tests for int fail because the error messages are a bit different (when creating int from bytes or bytearray the value is not shown). I may have run other tests and they have not failed. I have not tested on anything but x86-64. The following testcases yield 42x performace improvement:
|
Did you mean to upload a patch? |
Hi, I worked on a similar patch 6 years ago, while Python 3.0 was developped: The summary is that using GMP makes Python slower because most numbers are small: fit in [-2^31; 2^31-1], and GMP allocation is expensive. There is also a license issue: GMP license is GPL which is not compatible with the Python license. If you want to work on large numbers, you can gmpy: """
That's not a common use case. Run the Python benchmark suite with your patch to see if your patch has a similar overhead than my old patch. |
PyLongObject is a PyVarObject. It contains many mp_limb_t's. There is little overhead. For some operations if the result is in [-20;256] no memory will be allocated. There are special codepaths for 1-limb operations. And I just finished GDB support. Please test if it works for you. |
Hmm, the license (LGPL) should only matter for the Windows binaries Even *if* the Windows binaries were built with gmp support, it would I haven't looked at the patch in detail, but I don't have any Of course we'd have to set up buildbots for the option etc... |
After some minor optimizations my implementation is about 1.8% slower on pystone and about 4% slower on bm_nqueens. It's 4 times faster on bm_pidigits. |
Please try the Python benchmark suite. |
I *do* have an objection to adding the configure option: from that point on, it means that maintaining the GMP-based long implementation is now the responsibility of the core developers, and I think that's an unnecessary maintenance burden, for an option that few users will care about. I think having two long integer implementations in the core is worse than having one. |
I agree. If the GMP implementation is accepted, the old implementation must be dropped and replaced by the GMP implementation. |
Agreed. I'm open to that, but it's critical that common use-cases (i.e., those *not* using 1000-digit integers!) aren't slowed down. |
Note that we could probably release the GIL in the current implementation, too - we just haven't bothered adding such an optimization. |
After optimization, tests on small ints (< 2**30) Currently only addition, subtraction, negation and ~ are a bit slower (< 5%). Most other operations are the same. Bitwise operators, //, %, ** and pow are faster. Converting to and from strings is a bit faster. pickle, marshal and json are faster. bm_nqueens is a bit slower. pystone is a bit faster. There are no performance regressions in other benchmarks. When I fix +,- and ~ I will reupload the patch. |
IMO you must discuss the GMP license on the python-dev mailing list since we wrote that if the GMP patch is accepted, it will not be optional and so affect all platforms. |
I think the maintenance implications of having another external dependency would also need discussion on python-dev. |
Hmm. Looking back at my previous comments, I should explain my negativity a bit better.
To expand on 2: we already have a simple, highly portable, battle-tested implementation of big integers that's reasonably efficient for normal uses and requires little day-to-day maintenance. We'd be replacing that with something that's a lot more complicated, less thoroughly tested, and *not* significantly more efficient in normal use-cases. Apart from the pain of the transition (and any refactor of this size is bound to involve some pain), I'm particularly worried about future headaches involved in maintaining the external GMP dependency: keeping up with bugfixes, managing the binary builds, etc. I anticipate that that would add quite of a lot of work for the core team in general and those building releases in particular. (And that's another reason that we should have a python-dev discussion, so that those involved in the release process get a chance to weigh in.) To offset that, there needs to be a clear *benefit* to making this change. A couple of specific questions for Hristo Venev:
[I'm deliberately steering clear of the licensing issues; it needs discussion, but IANAL and I have nothing useful to contribute here.] |
Previous python-dev discussions: https://mail.python.org/pipermail/python-dev/2008-November/083315.html [Regarding the ancient mpz module, which used to be part of Python] https://mail.python.org/pipermail/python-dev/2001-December/018967.html |
... and if there's one person who's *very* well placed to comment on the ease or difficulty of keeping up with GMP/MPIR (especially on Windows), it's Case Van Horsen, who I notice has recently added himself to the nosy. @casevh: any comments? |
Disclaimer: as Mark alluded to, I maintain gmpy2. Some comments on MPIR/GMP: For all practical purposes, building GMP on Windows requires some version of the mingw compiler toolchain. None of the performance gains of custom assembly code is available if GMP is build with the VS compiler. When compiled with mingw, GMP supports CPU detection to automatically use code optimized for the specific instruction set. This level of optimization may not be needed for Python, though. The MPIR fork of GMP can be built with VS. Assembly code is supported via the YASM assembler plugin. Only a single instruction set is supported by the lib/dll. gmpy2 currently uses MPIR. I've had no issues with its stability. The mpz type has a maximum precision. IIRC, the maximum length is 2^31 bits on a 32-bit platform and 2^37 on a 64-bit platform. The mpn interface may or may not have the same restrictions. This might impact code that runs correctly, but slowly, with Python's normal PyLong implementation. GMP does not handle out-of-memory situations gracefully. When GMP encounters a memory allocation failure (exceeding the limits above or when running our of scratch space), it will just abort. It is easy in gmpy2 to trigger an abort that will crash the Python interpreter. My main concern is tightly linking the Python interpreter to a specific version of GMP (i.e. whatever version is used for the Windows builds or the version provided by the distribution). As long as gmpy2 can continue to use another version of GMP, it shouldn't matter to me. GMP and MPIR are both licensed under LGPL v3+ (not v2+). I'll reserve any further licensing discussions for python-dev. I'll try to test the patch this weekend and that should answer some of my questions. |
I agree with all of Mark's objections. Unless there is a compelling win, Python is better-off without the maintenance, portability, and licensing issues. I have vague memories of this having been discussed a long time ago and it is unlikely that there have been any changes to the reasons for not doing it. |
I've successfully tested the patch. The patch works fine but there are a couple of issues:
I've done some basic tests but I'll wait until Hristo updates the patch with his improvements before I run detailed tests. |
Ouch; that's not friendly. Seems like this is part of the reason that Armin Rigo abandoned the attempt to use GMP in PyPy. |
Can we close this issue? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: