New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce hash collisions for objects with no __hash__ method #49436
Comments
In the bpo-5169 discussion, Antoine Pitrou suggested that for an object Here's a patch. |
Here are some timings for dict creation, created with the attached script. Typical results on my machine (OS X 10.5.6/Intel), 32-bit non-debug build dict creation (selected): 1.95572495461 and after: dict creation (selected): 1.7055079937 # (14% speedup) BTW, all tests pass on my machine with this patch applied. |
The code path for SIZEOF_LONG < SIZEOF_VOID_P could probably also (unfortunately it seems the 64-bit Windows buildbot has disappeared) |
Benchmark results on my machine (64-bit Linux, gcc 4.3.2, AMD X2 3600+): Before: After: |
I observe even greater speedups (15%/20%/37%) on set creation. Here is |
Some comments, while I remember:
y = _Py_HashPointer((void*)(a->m_ml->ml_meth)); in meth_hash in methodobject.c, for example; here ml_meth is a C function
|
Some tests on py3k (32-bit build): >>> l = [object() for i in range(20)]
>>> [id(l[i+1]) - id(l[i]) for i in range(len(l)-1)]
[16, -96, 104, 8, 8, 8, 8, 8, -749528, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]
>>> class C(object):
... __slots__ = ()
...
>>> l = [C() for i in range(20)]
>>> [id(l[i+1]) - id(l[i]) for i in range(len(l)-1)]
[-104, 24, 8, 8, 8, 8, 8, 8, 8, 8, 8, 16, 8, 8, 8, 8, 16, -8, 16]
>>> class C(object):
... __slots__ = ('x')
...
>>> l = [C() for i in range(20)]
>>> [id(l[i+1]) - id(l[i]) for i in range(len(l)-1)]
[432, 24, -384, 408, 24, 24, -480, 528, 24, 24, 24, 24, 48, -360, 504,
24, -480, 552, 24] So, as soon as an user-defined type isn't totally trivial, it is Note: a 64-bit build shows an even greater allocation unit: >>> class C(object):
... __slots__ = ('x')
...
>>> l = [C() for i in range(20)]
>>> [id(l[i+1]) - id(l[i]) for i in range(len(l)-1)]
[56, -112, 168, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56,
56, 56] I wonder why the allocation unit is 56 and not 48 (2*24). |
Le mardi 10 février 2009 à 21:18 +0000, Antoine Pitrou a écrit :
> Note: a 64-bit build shows an even greater allocation unit:
>
> >>> class C(object):
> ... __slots__ = ('x')
> ...
> >>> l = [C() for i in range(20)]
> >>> [id(l[i+1]) - id(l[i]) for i in range(len(l)-1)]
> [56, -112, 168, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56,
> 56, 56]
>
> I wonder why the allocation unit is 56 and not 48 (2*24). I have found the answer. The PyGC_Head forces its own alignment using a |
On my 64-bit linux box there's nothing in the last 4 bits: >>> [id(o)%16 for o in [object() for i in range(128)]]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0] And with a bit more complicated functions I can determine how much shift def a(size, shift):
return len(set((id(o) >> shift) % (size * 2) for o in [object() for
i in range(size)]))
def b(size):
return [a(size, shift) for shift in range(11)]
def c():
for i in range(1, 9):
size = 2**i
x = ', '.join('% 3s' % count for count in b(size))
print('% 3s: %s' % (size, x)) >>> c()
2: 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2
4: 1, 1, 2, 3, 4, 3, 2, 4, 4, 3, 2
8: 1, 2, 4, 6, 6, 7, 8, 6, 4, 3, 2
16: 2, 4, 7, 9, 12, 13, 12, 8, 5, 3, 2
32: 4, 8, 14, 23, 30, 25, 19, 12, 7, 4, 2
64: 8, 16, 32, 55, 64, 38, 22, 13, 8, 4, 2
128: 16, 32, 64, 114, 128, 71, 39, 22, 12, 6, 3
256: 32, 64, 128, 242, 242, 123, 71, 38, 20, 10, 5 The fifth column (ie 4 bits of shift, a divide of 16) works the best. .. although, if I replace object() with list() I get best results with a We may want something more complicated. |
Upon further inspection, although a shift of 4 (on a 64-bit linux box) >>> c()
2: 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2
4: 1, 1, 2, 3, 4, 3, 3, 2, 2, 2, 3
8: 1, 2, 4, 7, 8, 7, 5, 6, 7, 5, 5
16: 2, 4, 7, 11, 16, 15, 12, 14, 15, 9, 7
32: 3, 5, 10, 18, 31, 30, 30, 30, 31, 20, 12
64: 8, 14, 23, 36, 47, 54, 59, 59, 61, 37, 21
128: 16, 32, 58, 83, 118, 100, 110, 114, 126, 73, 41
256: 32, 64, 128, 195, 227, 197, 203, 240, 253, 150, 78 |
Instead of a shift, how about a rotate or byteswap in case the lower |
The alignment requirements (long double) make it impossible to have Hypothetically, a custom allocator could lower the alignment Note that mixing the bits back in, via XOR or similar, is actually more |
[Adam Olsen]
Not necessarily, since not all pointers passed to _Py_HashPointer come Python 2.7a0 (trunk:69516, Feb 11 2009, 10:43:51)
[GCC 4.2.1 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from hashlib import sha224
>>> hash(sha224)
47100514454970
>>> hash(sha224) % 16
10
<nitpick> If I recall correctly, the higher bits of the hash value also But I agree that mixing the bottom bits back in (via rotate, or xor, or |
Le mercredi 11 février 2009 à 03:31 +0000, Adam Olsen a écrit :
But list() and dict() don't use id() for hash. |
Here's an updated patch, that errs on the conservative side:
All tests pass with the patch applied. I've left the 'convert to Sample timings on 64-bit linux (non-debug trunk build, Core 2 Duo). before: dict creation (selected): 1.18751096725 and after: dict creation (selected): 1.06817317009 |
Antoine, I only meant list() and dict() to be an example of objects with Mark, I concede the point about rotating; I believe the cost on x86 is Why are you still only rotating 3 bits? My results were better with 4 Also, would the use of size_t make this code simpler? It should be the |
I'm fine with rotating 4 bits instead of 3, especially if the timings look We should really benchmark dict lookups (and set membership tests) as well |
At four bits, you may be throwing away information and I don't think I'm *much* more comfortable with a byte-swap, rotation, or xoring-in |
Would (id() >> 4) + (id() & 15) be ok? |
The current patch *does* do a rotation: it doesn't throw away any |
Here is an updated patch. It uses a 4-bit shift and an addition. We |
Here is an updated benchmark script, for both object() and an |
Guys, let's be careful. Make sure that efforts to randomize lower bits >>> len(set((x >> 4) + (x & 15) for x in range(10**6)))
62515 |
Ok, updated patch:
|
On the contrary, the expected collision rate for a half-full dictionary However, you're right that it's only one use case. Although creating a Creating a list of 100000 objects, then shuffling and picking a few That said, I still prefer the simplicity of a rotate. Adding an |
[Antoine]
pointer_hash4.patch looks fine to me. Still, I think it's worth [Adam]
It's easy enough to prove (just show that the function is reversible) assert len(set(ids)) == len(set(map(f, set(ids)))) # for any large |
"x |= x>>4" Are you (Ray) sure you didn't mean "x ^= x>>4" ? |
Testing with a large set of ids is a good demonstration, but not proof. However, XOR does work (OR definitely does not) — it's a 1-to-1 Additionally, it still gives the unnaturally low collision rate when |
David, yes, I did mean x ^= x>>4; |
Just out of interest, why? The cast is unnecessary: there's no ambiguity Other than that, the patch looks fine to me; x ^= x >> 4 would be fine |
Mark:
Well, I had memories of a weird signed/unsigned problem (bpo-4935) and Raymond:
The special casing is just there so as to make all pointer bits |
I'm 99.9% sure that it's not a problem here. If it is a problem then it The relevant section of the C99 standard is 6.3.1.8, para. 1 (try low_rank_unsigned_integer binary_op higher_rank_signed_integer where the type of the expression depends on the relative sizes (not just I guess it comes down to personal taste, but my own preference is to |
I've just tried x ^= x >> 4 and the speedup is smaller on our |
+1 for checking in pointer_hash4.patch, provided Raymond approves. |
Consider it approved. Though I prefer you switch to x ^= x >> 4. |
Okay, how about this one? Short and sweet. No loss of information |
Apart from the misindentation (the file should use tabs not spaces), |
Antoine, x ^= x>>4 has a higher collision rate than just a rotate. If you modify the benchmark to randomly discard 90% of its contents this Here's the results I got (I used shift, too lazy to rotate): Not massive, but still worth fixing the hash. |
Apologies. My fault for editing Python files while at work, with a
I have now. See attached file for 3 sets of results (original, xor Summary: rotate is uniformly and significantly faster than xor; xor is |
Ok, so the rotate version is really significantly faster (and, as Adam |
Antoine, please check in a patch of your choice. I think we've beaten |
pointer_hash5_rotate.patch has been committed to trunk and py3k. Thanks! |
Sweet! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: