New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast modular exponentiation #40160
Comments
For crypto-sized numbers, Python mod-exp is several However, there's some low-hanging fruit: this patch has
Currently, the cutoff is 35 digits (525 bits). I've I know this is platform-dependent, but I think we A couple misc. things:
Libraries like GMP and LibTomMath work the same way.
|
Logged In: YES Uploading 2nd version of longobject.diff - the only change |
Logged In: YES Notes after a brief eyeball scan: Note that the expression a & 1 == 1 groups as a & (1 == 1) in C -- comparisons have higher precedence in C than bit- a & 1 has the hoped-for effect. and is clearer anyway. Would be better to use "**" than "^" in comments when Doc changes are needed, because you're changing visible Tests are needed, especially for new semantics. l_invmod can return NULL for more than one reason, but one The Montgomery reduction gimmicks grossly complicate this You're right that int pow must deliver the same results as Pragmatics: there's a better chance of making 2.4 if the |
Logged In: YES Tim, thanks for the feedback. I'm uploading a new patch Unfortunately, Montgomery is the bulk of the speedup: But I could split out the negative exponent handling into a Anyways, I'd like to add more tests for the exponentiation |
Logged In: YES Pragmatics are a real problem here, Trevor. I don't foresee But there are several independent changes in this patch, and |
Logged In: YES Pragmatics isn't my strong suit... but I get your drift :-).
I've left out the code which exposes l_invmod() to the user Anyways, these are applied sequentially: Should I open new tracker items for them? |
Logged In: YES Checked in the first part of the patch, with major format Include/longintrepr.h 2.15 I don't know whether it's possible for me to get to part 2 of |
Logged In: YES Same deal with the 2nd part of the patch (major format Include/longintrepr.h 2.16 This is cool stuff (& thank you!), but I'm sorry to say I can't |
Logged In: YES Here's the 3rd part of the patch (long_mont.diff; Montgomery Note that this doesn't include negative exponent handling. |
Logged In: YES I did more code review, testing, and timing. The only As far as testing, I used the random module and GMPY to As far as timing, I updated the benchmarks with a new (The below crypto library comes with a "book" which has an |
Logged In: YES oops. Good thing for random testing, carry propagation was |
Logged In: YES Montgomery has a fixed cost, so it slows down small |
Logged In: YES I updated this patch to CVS head, but didn't change it If I can do anything to help this make it in 2.5, let me know. |
Re-targeting for 2.6 |
Mark, as the second math guru in our team, you are probably interested |
I'll see if I can find time to look at this; I'm currently looking at |
Hi Mark, Let me know if I can give you any help with this. The original patch It appeared to be a significant speedup when I was last testing, and is |
Thanks, Trevor. I'm currently working on the 15-bit -> 30-bit digit |
By the way, I'd be interested to know if you (Trevor) have any thoughts on 30bit_longdigit13+optimizations.patch in the bpo-4258 discussion. These have been giving me some quite |
This patch still(!) applies almost perfectly cleanly to trunk. On a 64- >>> pow(0L, 0, 9223372036854775807)
28051505152L I haven't looked hard to figure out where this is coming from, but my My general feeling is that three-argument pow is such a little-used |
Looks like the test failure is a result of a misplaced (twodigits) cast: replacing the line carry += (twodigits) ( (*aptr) + (u * (*mptr++)) ); in function mont_reduce with carry += *aptr + (twodigits)u * *mptr++; fixes this. |
Here's a slightly modified version of Trevor's patch:
The rest of the patch looks fine to me, modulo some minor style issues. Two points:
|
Here's a second revision of Trevor's patch:
|
Some timings on my machine (OS X 10.6, 64-bit nondebug build, trunk r77157). These are just Without the patch: Mark-Dickinsons-MacBook-Pro:trunk dickinsm$ ./python.exe ../time_powmod.py With the patch: Mark-Dickinsons-MacBook-Pro:trunk-issue936813 dickinsm$ ./python.exe ../time_powmod.py So I'm seeing a speedup of 20-30%. I've attached the (rather primtive) timing script. Anyone else want to contribute timings? |
Hmm. For smaller inputs, I'm actually getting significant slowdowns: Unpatched: >>> timeit('pow(123, 123456789, 123456789L)')
7.355183839797974 Patched: >>> timeit('pow(123, 123456789, 123456789L)')
8.873976945877075 |
One more lot of timings, from Trevor's pow_benchmark.txt: Unpatched Patched (percent speedup) I'm not quite sure why I'm not seeing the same level of speedup that Trevor originally |
Okay, I retested the original patch without any of my refactoring (besides On a 32-bit non-debug trunk build (still on OS X 10.6), I get: Unpatched Mark-Dickinsons-MacBook-Pro:trunk dickinsm$ ./python.exe Patched Mark-Dickinsons-MacBook-Pro:trunk-issue936813 dickinsm$ ./python.exe I must admit I was hoping for a bit more than this. IMO, these speedups |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: