-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use 30-bit digits instead of 15-bit digits for Python integers. #48508
Comments
Here's an experimental patch, against the py3k branch, that makes Python On platforms for which autoconf is unable to find both 32-bit and 64-bit See also bpo-1814 (GMP for longs), and the discussion at http://mail.python.org/pipermail/python-dev/2008-November/083315.html (note particularly Tim Peter's message at: http://mail.python.org/pipermail/python-dev/2008-November/083355.html |
Note that to avoid "bad marshal data" errors, you'll probably need to do a |
Here's an updated patch, with the following changes from the original:
|
I saw that you choosed to use the base 2^30 for marshal. For a better If we change the marshal format of long, the magic number should be Cons: Use a different bases makes Python slower for loading/writing |
oh yay, thanks. it looks like you did approximately what i had started (i haven't reviewed your autoconf stuff yet) As for marhsalled data and pyc compatibility, yes that is important to We should probably base the decision on which digit size to use (embedded arm, mips or ppc come to mind as obvious things to test that on) |
The problem is also that with the 30-bit digit patch, some Python will use 15 And why 30 bits and not 31 bits, or 63 bits, or 120 bits? We may use other |
[Victor Stinner]
Agreed. I'll fix this so that the .pyc format is unchanged. Thanks!
Mostly laziness: the change from 15 to 30 bits turned out to be extraordinarily easy. Note that the longobject.c part of 31 bits would involve rewriting the powering algorithm, which assumes that PyLong_SHIFT is divisible by 5. It would gain 63 (or 62, or 60) bits is simply too large right now: you'd need access to a hardware 64 x 64 -> 128 bit multiply to make 120 bits? Does GMP even go this far? I guess part of the attraction is that it's a multiple of 8... The other obvious choices to consider would be 32 bits (or 16 bits, or 64 bits). This is possible, and may even be worth I guess my feeling is simply that the 15-bit to 30-bit change seems incredibly easy and cheap: very little code change, and [Gregory Smith]
The changes to configure and pyconfig.h are just from rerunning autoconf and autoheader; it's only configure.in that should |
It was an argument for changing the base used by the mashal :-)
Powering is an simple algorithm. If it was the division, it would be much
I wrote an example to detect overflows in C on the mailing list. Copy of my Add:
int a, b, sum;
sum = a + b;
exact = ((a < 0) ^ (b < 0)) || ((a < 0) == (sum < 0));
Substract:
int a, b, diff;
diff = a + b;
exact = ((a < 0) == (b < 0)) || ((a < 0) == (diff < 0));
Multiply:
int a, b, product;
if (a == 0 || b == 0) {
product = 0; /* exact */
} else if (a != INT_MIN || (b == 1)) {
product = a * b;
exact = (product / b) == a);
} else {
/* INT_MIN * -1 = -INT_MIN: overflow */
}
Divide:
int a, b, q;
if (a != INT_MIN || b != -1) {
q = a / b; /* exact */
} else {
/* INT_MIN / -1 = -INT_MIN: overflow */
} Checking overflow may costs more than using a smaller base. Only a benchmark
Python has an amazing regression test suite! I used it to fix my GMP patch. We Anyway, i love the idea of using 30 bits instead of 15! Most computer are now I started to work with GIT. You may be interrested to work together on GIT. |
Following Victor's suggestion, here's an updated patch; same as before, |
Mark: would it be possible to keep the "2 digits" hack in
|
Yes, it uses base 2**15 but it's not the correct conversion to base |
PyLong_FromLong() doesn't go into the 1 digit special case for
negative number. You should use:
/* Fast path for single-digits ints */
if (!(abs_ival>>PyLong_SHIFT)) {
v = _PyLong_New(1);
if (v) {
Py_SIZE(v) = sign;
v->ob_digit[0] = abs_ival;
}
return (PyObject*)v;
} |
I don't understand: yes, each base 2**30 digit is converted to a pair
Can you give an example of an integer n such that marshal.dumps(n) gives |
Other responses...
Ah. I think I'm with you now. You're saying that ideally, marshal I also find the idea of making the marshal representation base 256 quite
No: the sign is stored in the size: if v is a PyLongObject then
Why don't we leave this kind of micro-optimization out until we've got
Well spotted! Yes, this should be fixed. I have a nasty feeling that I |
Here's a pybench comparison, on OS X 10.5/Core 2 Duo/gcc 4.0.1 (32-bit [create clean build of py3k branch] Highlights: SimpleLongArithmetic: around 10% faster. I'll investigate the slowdowns. |
The problem may comes from int64_t on 32 bits CPU. 32x32 -> 64 may be
Oh, I have an old patch. I will attach it to this message. About
|
I wrote a patch to compute stat about PyLong function calls. make (use setup.py): PyLong_FromLong: 168572 calls, min=( 0, ), avg=(1.4, ), max=( 3, ) pystone: PyLong_FromLong:1587652 calls, min=( 0, ), avg=(1.0, ), max=( 3, ) min/avg/max are the integer digit count (minimum, average, maximum). What can we learn from this numbers? PyLong_FromLong(), long_add() and long_compare() are the 3 most common Except PyLong_FromLong(), long_compare() and long_format(), arguments of the Biggest number is a number of 45 digits: maybe just one call to long_add(). long_bool() is never called with number bigger than 2 digits. long_sub() is never called with number bigger than 1 digit! |
And now the stat of Python patched with 30bit_longdigit3.patch. min/avg/max are now the number of bits which gives better make _FromLong: 169734 calls, min=( 0, ), avg=(11.6, ), max=( 32, ) Notes about make:
pystone _FromLong: 1504983 calls, min=( 0, ), avg=( 5.1, ), max=( 31, ) Notes about pystone:
Short summary:
|
Here's a minor update to the patch, that does some extra cleanup:
At some point I'll try to separate the pure bugfixes (missing casts, int |
Using 30bit_longdigit4.patch, I get this |
I like the idea of sys.int_info, but I would prefer a "base" attribute |
Here's a version of the 15-bit to 30-bit patch I did some testing of 100x100 digit and 1000x1000 digit On x86_64, I'm getting more spectacular results: The idea of the faster multiplication is quite simple: [Victor, please don't delete the old longdigit4.patch!] |
Just tested the patch, here are some benchmarks: ./python -m timeit -s "a=100000000;b=777777" "a//b" ./python -m timeit -s "a=100000000;b=777777" "a*b" ./python -m timeit -s "a=100000000;b=777777" "a+b" But it seems there are some outliers: ./python -m timeit -s "a=100000000**5+1;b=777777**3" "a//b" |
I wrote a small benchmark tool dedicated to integer operations (+ -
|
The most recent patch is out of date and no longer applies cleanly. I'm |
Updated patch against py3k. I'm interested in getting this into the trunk Notes:
|
Forgot to mention: you'll need to rerun autoconf and autoheader after |
Antoine, were your posted results on a 64-bit or a 32-bit system? |
On 32-bit x86 (1.4Ghz Efficeon) using gcc 4.3.2-1ubuntu12 I see the py3k: Those were from the best of five runs after a warmup loop while the baseline is entirely unpatched. |
Gregory, are you sure you didn't swap the 30-bit and 30-bit+opt results? On OS X/Core 2 Duo my timings are the other way around: 30bit is Macintosh-3:py3k-30bit-opt dickinsm$ ./python.exe ../pidigits_noprint.py |
And here are results from 64-bit builds on the same machine as above (OS X ./python.exe ../pidigits_noprint.py 2000 gives the following timings: 30-bit digits: Time; 1245.9 ms |
hmm yes, ignore my 13+optimize result. apparently that used 15bit |
Maybe configure didn't get updated properly? I'm almost sure I found a |
On all other follow-ups I agree, so no further comments there. http://codereview.appspot.com/14105/diff/1/2 http://codereview.appspot.com/14105/diff/1/2#newcode160
Ok, so I'd waive this for this patch; please do create a separate |
I think the docs say to run autoheader first, then autoconf. |
Perhaps by doing "autoreconf" instead? |
new results after fixing my longdigit13 build to use 30 bits instead of py3k: thats much saner. |
attaching an updated pidigits benchmark script that does a warmup run |
Here are the results from 32-bit x86 on core2 duo gcc 4.0.1 using 30-bit digits (14): 15719 ms (again, i had to manually add #define PYLONG_DIGIT_SIZE 30 to pyconfig.h and pybench runs on the same builds vs unpatched: 30-bit digits (14): -1.4% (slightly slower) My votes: Obviously use the optimized version (but fix the configure stuff). +0 for enabling it by default on 32bit builds. |
Before such a version gets committed, I'd like to see it on Rietveld |
The attached patch uses mul1 in long_mul in the version patched with Comparison between these two patches on hp pavilion Q8200 2.33GHz pybench patch new patch pidigits_noprint 2000 998 947 |
Sure. My original plan was to get the structural changes in first, and But now I think the x_divrem fix has to be considered a prerequisite for The x_divrem work actually simplifies the code (whilst not altering the I also want to understand Gregory's problems with configure before None of this is going to happen before the weekend, I'm afraid. |
http://codereview.appspot.com/14105/diff/1/2 http://codereview.appspot.com/14105/diff/1/2#newcode160
Done. |
Here's an updated patch that includes the x_divrem fixes and I think this is (modulo any requested changes) the version Here's a detailed list of the changes to x_divrem.
There are many other optimizations possible; I've tried only to include |
Updated benchmarks results with 30bit_longdigit17.patch:
This is very good work. |
Adding Tim Peters to the nosy list, mainly to give him an opportunity to |
It finally occurred to me that what might be killing 32-bit performance To test this, here's a version of 30bit_longdigit17.patch that replaces Results of running python pydigits_bestof.py 2000 on 32-bit OS X upatched py3k 30bit_longdigit17.patch 30bit_longdigit17+asm.patch The problem is that (e.g., in the main loop of x_divrem) we're doing a On 32-bit PowerPC things are even worse, since there there isn't even a So I could still be persuaded that 30-bit digits should only be enabled |
Okay, let's abandon 30-bit digits on 32-bit machines: it's still Here's a 'release-candidate' version of the patch: it's exactly the
I've also updated the version uploaded to Rietveld: the patchset there http://codereview.appspot.com/14105 Martin, thank you for all your help with reviewing the previous patch. Gregory, if you have time, please could you double check that the Anyone else have any objections to this going in exactly as it is? I'd |
1 comment and 1 question about 30bit_longdigit20.patch:
I prefer the patch version 20 because it's much simplier than the |
[Victor]
Yes, but it's also for Unix: PYLONG_BITS_IN_DIGIT is only set when the --
I agree with you to some extent, but I'd still prefer to leave the 15-bit Barring further objections, I'm planning to get the |
I tried the patch on a 64-bit Linux system and it's ok. |
Committed 30bit_longdigit20.patch to py3k in r70460, with some minor Thanks all for your feedback. I might get around to backporting this to 2.7 one day... |
That should be r70459, of course. |
Great!
|
Backported to trunk in r70479. Mario, thanks for the long multiplication tweaks you submitted: could you |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: