-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge C version of decimal into py3k. #51901
Comments
I've created this issue to keep track of progress in merging Stefan Krah's work on decimal in c into py3k. We've created a branch py3k-cdecimal (with merge tracking from py3k) for this work. When the branch is fully working and tested we'll consult python-dev about next steps. |
Is the intention to write Decimal.__format__ in C, too? That would be quite a bit of work, and I'm not sure I could recommend it. But I'm not sure if your plan is to get rid of all Python code or not. If your plan is to rewrite absolutely everything in C, I could help out by exposing the methods that parse format specifiers and do some of the low level formatting. They're used internally by the int, float, and str formatting code. Let me know. |
Just to clarify, no decision has yet been made on *whether* the cdecimal work should be integrated into py3k; we'll consult python-dev on this once we've got a working branch and performance information. |
To answer Eric's question: Decimal.__format__ is already implemented in Stefan's work---it looks like most of the code is in http://svn.python.org/projects/python/branches/py3k-cdecimal/Modules/cdecimal/io.c (Stefan, is this right?) |
So the new branch looks great---thanks, Stefan! I'm only just beginning to look at the code properly, though. A couple of things: (1) Could we unify test_decimal and test_cdecimal somehow? This would avoid them getting out of sync when new tests are added, and would make it clear what the differences between them are. It looks like there's currently a lot of duplicate code. (2) At some point we'll need some documentation. Even if all it says is: the cdecimal module operates identically to the decimal module, with the following exceptions... (notes on threading differences, exponent limits, correct rounding of pow, etc.) |
An approach similar to the one taken in test_warnings.py might work: write common test code as a base class, then subclass it to be run against both the C and Py versions of the module. |
Yes, formatting is completely implemented in io.c, together with quite a comprehensive test suite. I like the new Python format strings, so I wanted them in the C library, too. |
Unify test_decimal and test_cdecimal: Yes, quite possible. The diff is currently 400 lines, but it should be easy to get that down to below 100 without any loss of functionality. I'll look into that when I'm done with the 64 bit ANSI path. Documentation: Anything is welcome, even a patch that just creates a stub so I don't have to figure out where to put it. The differences are listed at the bottom of: |
Just an update. Rev.77358 should compile and run stable on the buildbot platforms except Alpha and ia64. I'm working on a default ANSI path for |
As a first step in unifying test_decimal.py and test_cdecimal.py I (1) Remove test that Decimal(x) generates a copy. (2) Add test case to formatting test. (3) Extend threading test. (4) Use Emax of 425000000 instead of 999999999 where possible. If I get an OK for the two patches, I can commit them to py3k and |
All outstanding issues mentioned here have been solved in Rev. 77696: (1) New ANSI path for unknown 64bit platforms (ia64 and Alpha build (2) Unified tests for decimal and cdecimal. (3) Documentation for cdecimal. Other improvements: (4) Added comprehensive test suite for testing the library directly. (5) Fixed warnings in Visual Studio. (6) Code formatting. |
Has the cdecimal branch kept up with the hash value changes in 3.2? Is there a still a chance that cdecimal could be merged into 3.2? |
On Wed, Nov 3, 2010 at 3:28 AM, Case Van Horsen <report@bugs.python.org> wrote:
Not sure; that's a question for Stefan.
A chance, yes; the major need is for someone with time to do a full |
An update on the progress: All development currently happens in my private mpdecimal repository. The In py3k-cdecimal, r86497 is an exact copy of mpdecimal-2.0. All buildbots Major improvements: o Full compatibility with decimal.py 3.2, including the new hash o With the new FloatOperation signal, accidental float operations o The underlying library - libmpdec - now has 100% code coverage o The module has 85% code coverage. All lines except failures o Several minor bug fixes, most of them deal with allocation Potential reviewers: I'll be happy to answer questions here or privately. IMO the best way to |
Just a couple of remarks about the diff I created: The changes to decimal.py are exploratory (i.e. done quite hastily) library/cdecimal.rst is completely out of date. The rest should be very stable. |
The latest patch is based on a relatively stable revision of 3.3. To my libmpdec
PEP-399
_decimal.c
|
So, what is the status today? _decimal looks to be huge. Does Python really need yet another multiprecision library? There is already gmpy and bigfloat, based on the heavily optimized GMP library, for example. Is it a license issue? Can't we reuse GMP/MPFR to offer a Decimal API? _decimal should maybe first be distributed as a third party library until it is really well tested and its API is really stable, until we can decide to integrate it. The patch adds __setattr__ to the Decimal class. |
It's not really another library: it's a reimplementation of the existing decimal library in C. The decimal library is *hugely* valuable to the financial world, but its slowness is a major concern. _decimal would help to address that concern.
Nope: those are for binary floating-point. Shoehorning decimal semantics on top of a binary floating-point library is a really bad idea. (Actually, that's a part of why decimal.py is slow---it's using Python's *binary* integers to store *decimal* coefficients, so that even simple addition is now a quadratic operation, thanks to the binary <-> decimal conversions involved.)
My take is that this has already happened. The only problem from my perspective is getting someone to find time to review such a massive patch. I've been wondering whether we could get away with some kind of 'statistical' review: do a large-scale review, and then instead of having someone go through every line of C code, pick a few representative sections at random and review those. If those code portions make it through the review unscathed, declare the code good and merge it in. |
Binary versus decimal
_decimal is a PEP-399 compliant C implementation of decimal.py. The underlying decimal.py is used for standard-conforming financial calculations. There is Additionally, _decimal is also heavily optimized. In fact, for small Soundness and code size
Except for a different directory structure, the cdecimal module is There have been many downloads from financial institutions, stock cdecimal *appears* to be huge because it has a test suite that The test suite now tests against both decimal.py and decNumber. The latest tests against decNumber have found 18 issues in decNumber In the past 8 months, regression tests for cdecimal-2.3 have been Review The tricky algorithms (newtondiv, invroot, sqrt-via-invroot An initial audit could certainly disregard convolute.c, crt.c, These are only needed for the number theoretic transform that kicks Context type safety
Making the context more strictly typed has instantly found a bug # This doctest has always passed:
>>> c = Context(ExtendedContext)
# But the operation is meaningless:
>>> c
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/decimal.py", line 3708, in __repr__
% vars(self))
TypeError: %d format: a number is required, not Context
>>> What is the concern about __setattr__? For *setting* contexts, speed |
Mark Dickinson <report@bugs.python.org> wrote:
The regex module is in a somewhat similar situation. If I'm interpreting this http://mail.python.org/pipermail/python-dev/2011-August/113240.html dialogue correctly, a complete audit down to the last line isn't |
It is also helped by the fact you are a core developer and we trust you to be here to do maintenance :) I think it's still probably a good idea to probe python-dev, if that hasn't already happened. |
We've been wanting this for a long time. Strong +1 from me. |
I can help with the review. Is http://bugs.python.org/review/7652/show a good starting point? I already have some comments. |
Antoine Pitrou <report@bugs.python.org> wrote:
Sure. The specification doesn't really change, so the work will hopefully
Yes, I'm planning to do that. |
Raymond Hettinger <report@bugs.python.org> wrote:
Thank you, Raymond! |
Oh, I forgot this minor detail (base 2 vs base 10). |
Amaury Forgeot d'Arc <report@bugs.python.org> wrote:
Yes, that would be great. Apart from two or three changes that I still |
Jim Jewett <report@bugs.python.org> wrote:
I failed to mention that libmpdec also has complete documentation. Perhaps http://www.bytereef.org/mpdecimal/doc/libmpdec/index.html |
Marc-Andre Lemburg <report@bugs.python.org> wrote:
Not yet. I'll try to make a list with proposed function names and post it here.
Yes, sure. |
This issue was raised by Jim on Rietveld: Currently, the order of arguments in Context.__init__() differs >>> Context()
Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999999, capitals=1, flags=[], traps=[DivisionByZero, Overflow, InvalidOperation])
>>>
>>> Context(28, ROUND_HALF_EVEN, -999999999, 999999999, 1, [], [DivisionByZero, Overflow, InvalidOperation])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.2/decimal.py", line 3774, in __init__
flags = dict([(s, int(s in flags)) for s in _signals])
File "/usr/lib/python3.2/decimal.py", line 3774, in <listcomp>
flags = dict([(s, int(s in flags)) for s in _signals])
TypeError: argument of type 'int' is not iterable
>>> I find this quite ugly. The repr() order is the good one, since I don't think Python code would initialize a context without keywords, |
On Wed, Mar 7, 2012 at 5:28 AM, Stefan Krah
That is indeed very helpful. So helpful that now I understand well Starting from that URL, I don't actually find setup.py. I am assuming (but would prefer to have the file explicitly state) I'm not sure what sort of failures building in the normal order led I didn't see any mention of the literature subdirectory, which makes I suppose a subdirectory name like "python" makes sense when you look Within the library, does io.[ch] really limit itself to ASCII? If so, Within memory.[ch], how much of this configurability is useful (or Under the Bignum section, it mentions that functions from these files Also, when it talks about "large arrays", could you clarify that it
OK, let me rephrase. Is newton division exported to users, or used _mpd_qtest_newtondiv is documented as a testcase; I would rather see
[And should therefore be available when used without python?]
Are those tests relevant to a _cdecimal built in to python itself? If
I would not have made the leap from that to "What to do if the |
Jim Jewett <report@bugs.python.org> wrote:
It's the setup.py from the Python top level directory.
I do not have access to AIX. On Windows the failures were locale related
FILEMAP.txt was intended to get people started, not to be a polished work.
OK.
There is no source code generation. I'm not sure where this idea comes Usage: ../../../python genlocale.py | ../tests/runtest -
I do mean ASCII. Please run this gdb session: diff --git a/Modules/_decimal/io.c b/Modules/_decimal/io.c
--- a/Modules/_decimal/io.c
+++ b/Modules/_decimal/io.c
@@ -245,7 +245,7 @@
if (digits > (size_t)(ctx->prec-ctx->clamp))
goto conversion_error;
}
- else if (_mpd_strneq(s, "inf", "INF", 3)) {
+ else if (strncasecmp(s, "inf", 3) == 0) {
s += 3;
if (*s == '\0' || _mpd_strneq(s, "inity", "INITY", 6)) {
/* numeric-value: infinity */ gdb ../../python
b mpd_qset_string
r
>>> locale.setlocale(locale.LC_ALL, 'tr_TR.utf8')
'tr_TR.utf8'
>>> from decimal import *
>>> Decimal("Infinity") (gdb) Clearly 'I' is ASCII and strncasecmp fails, exactly as written in the
Since it is in the libmpdec section, "User" refers to the extension writer.
OK.
But it is referring to abstract sequences or arrays of values less I thought 'Bignum' would already imply an array of machine words
It's used internally as _mpd_qbarrett_divmod(). When the coefficient has
I said in my last mail that I'll look into it.
I meant: Despite the fact that io.c might /appear/ to be specifically |
Here's a new patch that addresses several remarks: o _decimal now has __floor__ and __ceil__methods. o libmpdec is now in a separate directory. o I removed the big libmpdec test suite. This is why:
o Renamed "python" directory into "tests". o Test functions in mpdecimal.c are removed. o All files in libmpdec that aren't required for _decimal are removed. o Extra functionality in _decimal.c is commented out. This does not |
Please add --with-system-libmpdec (or --with-system-mpdecimal) option in |
Benjamin Peterson <report@bugs.python.org> wrote:
Actually the trickier instances of "inline" in the .c files are already http://hg.python.org/features/cdecimal/file/0f032cda94aa/Modules/_decimal/README.txt As I now remember, that was in fact necessary for CompCert. The "static inline" http://embeddedgurus.com/barr-code/2011/03/do-inline-function-bodies-belong-in-c-header-files/ IIRC also the Linux kernel uses "static inline" in header files. |
FWIW, I think we would be better off if this patch were merged in soon. Waiting until later in the release cycle risks introducing bugs that we won't have time to notice or fix. An early merge lets more people exercise the code. |
Stefan, please feel free to commit the latest patch. |
+1 |
Mark Dickinson <report@bugs.python.org> wrote:
OK, I'm counting three +1 for merging soon. Thanks everyone for the I'll then proceed with the merge this weekend or shortly after, My *second* self-review of mpdecimal.c is currently at 17%, but I can |
Arfrever Frehtes Taifersar Arahesis report@bugs.python.org wrote:
I've added that to the list of issues. However, if there is any change in I can probably release a matching external libmpdec with every major Python This would mean that for Python-3.3 the yet unreleased mpdecimal-2.4, |
We need to decide what to do with the different limits of the 64-bit and
default context 10**9-1 I think it would be annoying to have the values in DefaultContext, The best thing might be to use Emax=10**8-1 and Emin=-(10**8-1) throughout. |
10**6-1 would be another option. The advantage is that it's visually |
New changeset 7355550d5357 by Stefan Krah in branch 'default': |
New changeset f6d646e30028 by Stefan Krah in branch 'default': |
You may now close the issue. |
I have a couple of questions about the proposed capsule C API. In order to PyDec_Add(PyObject *a, PyObject *b, PyObject *context); To stay sane, either access macros or access functions to the context would CTX(workcontext)->prec = 100;
PyDec_SetPrec(workcontext, 100);
PyDecContext_SetPrec(workcontext, 100); Probably flags would need to be checked at some point: if (CTX(workcontext)->status & MPD_Underflow) I wonder if users would not be better off if libmpdec functions and Also, I find things like (status & MPD_Underflow) more readable than I've attached a short (fake) example to demonstrate what I mean. |
I've opened bpo-15237 for the capsule API. I didn't add everyone to the |
New changeset afdb0e1a9dac by Stefan Krah in branch 'default': |
I switched the algorithm in mpd_qsqrt() to the one from decimal.py. Curiously enough this scheme _always_ seems to produce exact results "Square-root can also be calculated by using the power operation (with a second operand of 0.5). The result in that case will not be exact in most cases, and may not be correctly rounded." Anyway, the algorithm from decimal.py gives the desired guarantees Since we're almost in beta-2 stage, would someone be able to do a |
I compared both implementations, and they are the same. I noticed that on line 7537, the call to mpd_qshiftl() may "goto malloc_error;". I think there is a memory leak in this case, "mpd_del(&c)" and 2 others lines are skipped. |
Thanks, Amaury! -- "goto malloc_error" should not leak, because there's always I use this idiom a lot ever since I saw it in several places in the Linux |
Is there some remaining work on this issue? Or can it be closed? |
I'm almost done with my (second) self-review of mpdecimal.c. The only If there are any takers, I would be very happy. For the Karatsuba functions |
My review is done. The Karatsuba function is basically a small stack machine |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: