-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
long int bitwise ops speedup (patch included) #41337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The 'inner loop' for applying bitwise ops to longs is quite The improvement in the attached diff is
Operations on longs of a few thousand bits appear |
Logged In: YES I originally timed this on a cygwin system, I've since found |
Logged In: YES Patch Review On Windows using MSC 6.0, I could only reproduce about a While the patch is short, it adds quite a bit of complexity Unless you have important use cases and feel strongly about An alternative to submit a patch that limits its scope to |
Logged In: YES I started by just factoring out the inner switch loop. But then I see a repeatable 1.5 x speedup at 300 bits, which One use case is that you can simulate an array IMHO, I don't think the changed code is more complex; it's a I see a lot of effort being expended on very complex |
Logged In: YES Assigned to me, changed Category to Performance. |
Tim, what is your view on this patch? |
It's probably useful for Python 3.0 since the old int type is gone. |
Actually, my view for 3.x is this: I do agree hugely with the 'top I did spend some time delving into python internals years ago, but (1) integers all same size, allocated from linked-list pool instead It would seem to me that a more suitable implementation would be to I know there's a lot of code in that module, virtually all of which |
The type is an important performance factor but most uses of it are for
Yes, they are. As a result, calculations on small ints have become a bit |
[Greg]
Do you know of any publicly available code that takes this approach, For 32-bit limbs (on a 32-bit platform, say, with no C99 support and no The sign-magnitude versus two's complement is an orthogonal issue, I Here's the example promised earlier: yesterday I wanted to add two 128- #include <stdint.h> /* *sumhigh:*sumlow = ahigh:alow + bhigh:blow */ void
add_128(uint64_t *sumhigh, uint64_t *sumlow,
uint64_t ahigh, uint64_t alow,
uint64_t bhigh, uint64_t blow)
{
alow += blow;
ahigh += bhigh + (alow < blow);
*sumlow = alow;
*sumhigh = ahigh;
} Ideally, the compiler would manage to optimize this to a simple 'addq, _add_128: (Here it looks like alow and blow are in r9 and rcx, ahigh and bhigh are How do you write the C code in such a way that gcc produces the right |
Here's the inline assembly version, for comparison: #define SUM2_BIN64(sumhigh, sumlow, ahigh, alow, bhigh, blow) \
__asm__ ("addq\t%5, %1\n\t" \
"adcq\t%3, %0" \
: "=r" (sumhigh), "=&r" (sumlow) \
: "0" ((uint64_t)(ahigh)), "rm" ((uint64_t)(bhigh)), \
"%1" ((uint64_t)(alow)), "rm" ((uint64_t)(blow)) \
: "cc")
void
add_128_asm(uint64_t *sumhigh, uint64_t *sumlow,
uint64_t ahigh, uint64_t alow,
uint64_t bhigh, uint64_t blow)
{
SUM2_BIN64(ahigh, alow, ahigh, alow, bhigh, blow);
*sumlow = alow;
*sumhigh = ahigh;
} And the generated output (again gcc-4.4 with -O3): _add_128_asm: |
Antoine, "most uses of it are for small ints (< 2**32 or 2**64), ", I don't see a problem with sign-magnitude for old long ints, or for GMP, Mark, what you've written looks fine to me. It would be a bit faster (And, for speed freaks who want to use long ints to implement large bits The 'algorithmic C' package As for multiplies and divides - it's certainly possible to proceed I would be happy to do a design doc for this and write some of the inner |
I think the only realistic chance in which something may change is that It's not being done (to my knowledge), for the reason alone that it I notice that such discussion is off-topic for this bug report, which |
Hmm. I agree this isn't ideal, and I now see the attraction of It would be interesting to see timings from such an approach. |
I think Greg's patch looks fine, modulo updating it to apply cleanly to I couldn't resist tinkering a bit, though: factoring out the complement Here's the patch. |
Mark, if you want to get reviews, it might be useful to upload the patch |
Applied in r75697 (trunk) and r75698 (py3k). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: