Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce hash collisions for objects with no __hash__ method #49436

Closed
mdickinson opened this issue Feb 8, 2009 · 43 comments
Closed

Reduce hash collisions for objects with no __hash__ method #49436

mdickinson opened this issue Feb 8, 2009 · 43 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@mdickinson
Copy link
Member

BPO 5186
Nosy @rhettinger, @jcea, @mdickinson, @pitrou
Files
  • pointer_hash.patch
  • pointer_hash2.patch
  • pointer_hash3.patch
  • time_object_hash.py
  • pointer_hash4.patch
  • linux_x86_64_timings.txt
  • pointer_hash5_xor.patch
  • pointer_hash5_rotate.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pitrou'
    closed_at = <Date 2009-02-13.14:04:34.963>
    created_at = <Date 2009-02-08.16:36:16.581>
    labels = ['interpreter-core', 'performance']
    title = 'Reduce hash collisions for objects with no __hash__ method'
    updated_at = <Date 2009-02-13.20:23:29.133>
    user = 'https://github.com/mdickinson'

    bugs.python.org fields:

    activity = <Date 2009-02-13.20:23:29.133>
    actor = 'rhettinger'
    assignee = 'pitrou'
    closed = True
    closed_date = <Date 2009-02-13.14:04:34.963>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2009-02-08.16:36:16.581>
    creator = 'mark.dickinson'
    dependencies = []
    files = ['12977', '13026', '13038', '13039', '13040', '13068', '13069', '13070']
    hgrepos = []
    issue_num = 5186
    keywords = ['patch']
    message_count = 43.0
    messages = ['81389', '81390', '81396', '81398', '81400', '81599', '81602', '81606', '81619', '81620', '81625', '81635', '81637', '81639', '81640', '81670', '81672', '81678', '81679', '81681', '81682', '81683', '81685', '81689', '81700', '81705', '81709', '81714', '81718', '81734', '81737', '81744', '81750', '81755', '81759', '81768', '81778', '81822', '81921', '81923', '81927', '81929', '81970']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'jcea', 'chemacortes', 'mark.dickinson', 'Rhamphoryncus', 'pitrou', 'LambertDW']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue5186'
    versions = ['Python 3.1', 'Python 2.7']

    @mdickinson
    Copy link
    Member Author

    In the bpo-5169 discussion, Antoine Pitrou suggested that for an object
    x without a __hash__ method, id()/8 might be a better hash value than
    id(), since dicts use the low order bits of the hash as initial key, and
    the 3 lowest bits of an id() will always be zero.

    Here's a patch.

    @mdickinson mdickinson added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Feb 8, 2009
    @mdickinson
    Copy link
    Member Author

    Here are some timings for dict creation, created with the attached script.
    They're not particularly scientific, but they at least show that this one-
    line optimization can make a significant difference.

    Typical results on my machine (OS X 10.5.6/Intel), 32-bit non-debug build
    of the trunk (r69442): before

    dict creation (selected): 1.95572495461
    dict creation (shuffled): 1.98964595795
    dict creation: 1.78589916229

    and after:

    dict creation (selected): 1.7055079937 # (14% speedup)
    dict creation (shuffled): 1.5843398571 # (25% speedup)
    dict creation: 1.32362794876 # (34% speedup)

    BTW, all tests pass on my machine with this patch applied.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 8, 2009

    The code path for SIZEOF_LONG < SIZEOF_VOID_P could probably also
    benefit from this optimization by casting the pointer to a size_t (this
    will effect 64-bit Windows, where long is 32 bits but pointers are 64 bits).

    (unfortunately it seems the 64-bit Windows buildbot has disappeared)

    @pitrou
    Copy link
    Member

    pitrou commented Feb 8, 2009

    Benchmark results on my machine (64-bit Linux, gcc 4.3.2, AMD X2 3600+):

    Before:
    dict creation (selected): 5.09600687027
    dict creation (shuffled): 5.66548895836
    dict creation: 3.72823190689

    After:
    dict creation (selected): 4.57248306274 (10% speedup)
    dict creation (shuffled): 4.81948494911 (15% speedup)
    dict creation: 2.43905687332 (35% speedup)

    @pitrou
    Copy link
    Member

    pitrou commented Feb 8, 2009

    I observe even greater speedups (15%/20%/37%) on set creation. Here is
    the updated benchmark script.

    @mdickinson
    Copy link
    Member Author

    Some comments, while I remember:

    • the argument to _Py_HashPointer is not always divisible by 8. It's
      called to create hashes for various objects, including methodobjects; see
      the line:
    	y = _Py_HashPointer((void*)(a->m_ml->ml_meth));

    in meth_hash in methodobject.c, for example; here ml_meth is a C function
    pointer. I can't see how this could be a problem, though, especially as
    is seems very unlikely that two function pointers could be less than 8
    bytes apart.

    • following from the above, it's also pretty unlikely that any two object
      pointers will be less than 16 bytes apart, so it might be worth seeing if
      performance with >>4 is noticeably any different from with >>3.

    • we should make sure that the value returned by _Py_HashPointer isn't the
      illegal hash value -1 (though it's difficult to see how it could be). One
      safe way to do this that shouldn't cost any CPU cycles would be to cast to
      unsigned long before the right shift, to be sure that the right shift
      zero-extends instead of sign-extending, so that the result is guaranteed
      nonnegative.

    • It *would* be nice to do something about the SIZEOF_LONG < SIZEOF_VOID_P
      case: the current conversion to a PyLong seems like quite an expensive way
      to go. And I've just ordered a computer with 64-bit Windows installed...

    @pitrou
    Copy link
    Member

    pitrou commented Feb 10, 2009

    Some tests on py3k (32-bit build):

    >>> l = [object() for i in range(20)]
    >>> [id(l[i+1]) - id(l[i]) for i in range(len(l)-1)]
    [16, -96, 104, 8, 8, 8, 8, 8, -749528, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]
    >>> class C(object):
    ...    __slots__ = ()
    ... 
    >>> l = [C() for i in range(20)]
    >>> [id(l[i+1]) - id(l[i]) for i in range(len(l)-1)]
    [-104, 24, 8, 8, 8, 8, 8, 8, 8, 8, 8, 16, 8, 8, 8, 8, 16, -8, 16]
    >>> class C(object):
    ...    __slots__ = ('x')
    ... 
    >>> l = [C() for i in range(20)]
    >>> [id(l[i+1]) - id(l[i]) for i in range(len(l)-1)]
    [432, 24, -384, 408, 24, 24, -480, 528, 24, 24, 24, 24, 48, -360, 504,
    24, -480, 552, 24]

    So, as soon as an user-defined type isn't totally trivial, it is
    allocated in at least 24-byte memory units. Shifting by 4 shouldn't be
    detrimental performance-wise, unless you allocate lots of purely empty
    object() instances...

    Note: a 64-bit build shows an even greater allocation unit:

    >>> class C(object):
    ...    __slots__ = ('x')
    ... 
    >>> l = [C() for i in range(20)]
    >>> [id(l[i+1]) - id(l[i]) for i in range(len(l)-1)]
    [56, -112, 168, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56,
    56, 56]

    I wonder why the allocation unit is 56 and not 48 (2*24).

    @pitrou
    Copy link
    Member

    pitrou commented Feb 10, 2009

    Le mardi 10 février 2009 à 21:18 +0000, Antoine Pitrou a écrit :
    > Note: a 64-bit build shows an even greater allocation unit:
    > 
    > >>> class C(object):
    > ...    __slots__ = ('x')
    > ... 
    > >>> l = [C() for i in range(20)]
    > >>> [id(l[i+1]) - id(l[i]) for i in range(len(l)-1)]
    > [56, -112, 168, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56,
    > 56, 56]
    > 
    > I wonder why the allocation unit is 56 and not 48 (2*24).

    I have found the answer. The PyGC_Head forces its own alignment using a
    "long double" dummy, which in 64-bit mode (Linux / gcc) wastes 8 bytes
    between the end of the PyGC_Head and the PyObject itself.
    (SIZEOF_LONG_DOUBLE is 16 in pyconfig.h)

    @Rhamphoryncus
    Copy link
    Mannequin

    Rhamphoryncus mannequin commented Feb 11, 2009

    On my 64-bit linux box there's nothing in the last 4 bits:

    >>> [id(o)%16 for o in [object() for i in range(128)]]
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0]

    And with a bit more complicated functions I can determine how much shift
    gives us the lowest collision rate:

    def a(size, shift):
        return len(set((id(o) >> shift) % (size * 2) for o in [object() for
    i in range(size)]))
    
    def b(size):
        return [a(size, shift) for shift in range(11)]
    
    def c():
        for i in range(1, 9):
            size = 2**i
            x = ', '.join('% 3s' % count for count in b(size))
            print('% 3s: %s' % (size, x))
    >>> c()
      2:   1,   1,   1,   2,   2,   1,   1,   1,   2,   2,   2
      4:   1,   1,   2,   3,   4,   3,   2,   4,   4,   3,   2
      8:   1,   2,   4,   6,   6,   7,   8,   6,   4,   3,   2
     16:   2,   4,   7,   9,  12,  13,  12,   8,   5,   3,   2
     32:   4,   8,  14,  23,  30,  25,  19,  12,   7,   4,   2
     64:   8,  16,  32,  55,  64,  38,  22,  13,   8,   4,   2
    128:  16,  32,  64, 114, 128,  71,  39,  22,  12,   6,   3
    256:  32,  64, 128, 242, 242, 123,  71,  38,  20,  10,   5

    The fifth column (ie 4 bits of shift, a divide of 16) works the best.
    Although it varies from run to run, probably more than half the results
    in that column have no collisions at all.

    .. although, if I replace object() with list() I get best results with a
    shift of 6 bits. Replacing it with dict() is best with 8 bits.

    We may want something more complicated.

    @Rhamphoryncus
    Copy link
    Mannequin

    Rhamphoryncus mannequin commented Feb 11, 2009

    Upon further inspection, although a shift of 4 (on a 64-bit linux box)
    isn't perfect for dict, it's fairly close to it and well beyond random
    hash values. Mixing things more is just gonna lower it towards random
    values.

    >>> c()
      2:   1,   1,   1,   2,   2,   1,   1,   1,   1,   1,   2
      4:   1,   1,   2,   3,   4,   3,   3,   2,   2,   2,   3
      8:   1,   2,   4,   7,   8,   7,   5,   6,   7,   5,   5
     16:   2,   4,   7,  11,  16,  15,  12,  14,  15,   9,   7
     32:   3,   5,  10,  18,  31,  30,  30,  30,  31,  20,  12
     64:   8,  14,  23,  36,  47,  54,  59,  59,  61,  37,  21
    128:  16,  32,  58,  83, 118, 100, 110, 114, 126,  73,  41
    256:  32,  64, 128, 195, 227, 197, 203, 240, 253, 150,  78

    @rhettinger
    Copy link
    Contributor

    Instead of a shift, how about a rotate or byteswap in case the lower
    bits ever become significant again in some build.

    @Rhamphoryncus
    Copy link
    Mannequin

    Rhamphoryncus mannequin commented Feb 11, 2009

    The alignment requirements (long double) make it impossible to have
    anything in those bits.

    Hypothetically, a custom allocator could lower the alignment
    requirements to sizeof(void *). However, rotating to the high bits is
    pointless as they're the least likely to be used — impossible in this
    case, as only the 2 highest bits would contain anything, and for that
    you'd need a dictionary with at least 2 billion entries on 32bit, which
    is more than the 32bit address space. 64-bit is similar.

    Note that mixing the bits back in, via XOR or similar, is actually more
    likely to hurt than help. It's just like ints and strings, who's hash
    values are very sequential, a simple shift tends to get us sequential
    hashes. This gives us a far lower collision rate than a statistically
    random hash.

    @mdickinson
    Copy link
    Member Author

    [Adam Olsen]

    The alignment requirements (long double) make it impossible to have
    anything in those bits.

    Not necessarily, since not all pointers passed to _Py_HashPointer come
    from a PyObject_Malloc. _Py_HashPointer is used for function pointers
    as well. For example, on 64-bit linux I get:

    Python 2.7a0 (trunk:69516, Feb 11 2009, 10:43:51)
    [GCC 4.2.1 (SUSE Linux)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from hashlib import sha224
    >>> hash(sha224)
    47100514454970
    >>> hash(sha224) % 16
    10

    for that you'd need a dictionary with at least 2 billion entries
    on 32bit,

    <nitpick> If I recall correctly, the higher bits of the hash value also
    get used in collision resolution: they're mixed in to the algorithm
    that's used to produce the probing sequence when looking for an empty
    slot. So the dictionary wouldn't necessarily have to be quite that
    large for the top bits to come into play. </nitpick>

    But I agree that mixing the bottom bits back in (via rotate, or xor, or
    whatever) doesn't seem likely to help.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 11, 2009

    Le mercredi 11 février 2009 à 03:31 +0000, Adam Olsen a écrit :

    .. although, if I replace object() with list() I get best results with a
    shift of 6 bits. Replacing it with dict() is best with 8 bits.

    But list() and dict() don't use id() for hash.

    @mdickinson
    Copy link
    Member Author

    Here's an updated patch, that errs on the conservative side:

    • rotate instead of shifting, as suggested by Raymond. This costs
      very little, and I admit to feeling uncomfortable about the
      possibility of just throwing bits away

    • explicit check for -1

    • special case for sizeof(void *) = 2*sizeof(long)

    All tests pass with the patch applied. I've left the 'convert to
    PyLong' code in as a safety net: it's used on platforms where
    sizeof(void *) > sizeof(long) but sizeof(void *) != 2*sizeof(long). I
    don't know of any such platforms in current use.

    Sample timings on 64-bit linux (non-debug trunk build, Core 2 Duo).

    before:

    dict creation (selected): 1.18751096725
    dict creation (shuffled): 1.21234202385
    dict creation: 1.00831198692
    set creation (selected): 0.869561910629
    set creation (shuffled): 0.867420911789
    set creation: 0.77153301239

    and after:

    dict creation (selected): 1.06817317009
    dict creation (shuffled): 0.987659931183
    dict creation: 0.662216901779
    set creation (selected): 0.735805034637
    set creation (shuffled): 0.659453868866
    set creation: 0.445232152939

    @Rhamphoryncus
    Copy link
    Mannequin

    Rhamphoryncus mannequin commented Feb 11, 2009

    Antoine, I only meant list() and dict() to be an example of objects with
    a larger allocation pattern. We get a substantial benefit from the
    sequentially increasing memory addresses, and I wanted to make sure that
    benefit wasn't lost on larger allocations than object().

    Mark, I concede the point about rotating; I believe the cost on x86 is
    the same regardless.

    Why are you still only rotating 3 bits? My results were better with 4
    bits, and that should be the sweet spot for the typical use cases.

    Also, would the use of size_t make this code simpler? It should be the
    size of the pointer even on windows.

    @mdickinson
    Copy link
    Member Author

    I'm fine with rotating 4 bits instead of 3, especially if the timings look
    good on 32-bit as well as 64-bit.

    We should really benchmark dict lookups (and set membership tests) as well
    as dict creation.

    @rhettinger
    Copy link
    Contributor

    At four bits, you may be throwing away information and I don't think
    that's cool. Even if some selected timings are better with more bits
    shifted, all you're really showing is that there is more randomness in
    the upper bits than the lower ones. But that doesn't mean than the
    lower one contribute nothing at all.

    I'm *much* more comfortable with a byte-swap, rotation, or xoring-in
    upper bits than with shifts that potentially destroy entropy.
    Otherwise, your taxing apps that build giant sets/dicts and need all
    distinguishing bits to avoid collision pile-ups.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 11, 2009

    I'm *much* more comfortable with a byte-swap, rotation, or xoring-in
    upper bits than with shifts that potentially destroy entropy.
    Otherwise, your taxing apps that build giant sets/dicts and need all
    distinguishing bits to avoid collision pile-ups.

    Would (id() >> 4) + (id() & 15) be ok?

    @mdickinson
    Copy link
    Member Author

    At four bits, you may be throwing away information and I don't think
    that's cool.

    The current patch *does* do a rotation: it doesn't throw away any
    information.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 11, 2009

    Here is an updated patch. It uses a 4-bit shift and an addition. We
    should avoid the use of logical or, because it makes the outputs
    non-uniformly distributed ('1' bits are more likely).

    @pitrou
    Copy link
    Member

    pitrou commented Feb 11, 2009

    Here is an updated benchmark script, for both object() and an
    user-defined class, and adding dict lookup, set lookup and set difference.
    Set difference is massively faster: up to 60% faster.

    @rhettinger
    Copy link
    Contributor

    Guys, let's be careful. Make sure that efforts to randomize lower bits
    don't destroy information. Something like x |= x>>8 is reversible and
    fast. Other fun looking transformations are not:

        >>> len(set((x >> 4) + (x & 15) for x in range(10**6)))
        62515

    @pitrou
    Copy link
    Member

    pitrou commented Feb 11, 2009

    Ok, updated patch:

    • uses a 4-bit rotate (not shift)
    • avoids comparing an unsigned long to -1
    • tries to streamline the win64 special path (but I can't test)

    @Rhamphoryncus
    Copy link
    Mannequin

    Rhamphoryncus mannequin commented Feb 12, 2009

    At four bits, you may be throwing away information and I don't think
    that's cool. Even if some selected timings are better with more bits
    shifted, all you're really showing is that there is more randomness in
    the upper bits than the lower ones. But that doesn't mean than the
    lower one contribute nothing at all.

    On the contrary, the expected collision rate for a half-full dictionary
    is about 21%, whereas I'm getting less than 5%. I'm taking advantage of
    the sequentiality of addresses, just as int and str hashes do for their
    values.

    However, you're right that it's only one use case. Although creating a
    burst of objects for a throw-away set may itself be common, it's
    typically with int or str, and doing it with custom objects is
    presumably fairly rare; certainly not a good microbenchmark for the rest
    of the interpreter.

    Creating a list of 100000 objects, then shuffling and picking a few
    increases my collision rate back up to 21%. That should more accurately
    reflect a long-running program using custom objects as keys in a dict.

    That said, I still prefer the simplicity of a rotate. Adding an
    arbitrary set of OR, XOR, or add makes me uneasy; I know enough to do
    them wrong (reduce entropy), but not enough to do them right.

    @rhettinger
    Copy link
    Contributor

    [Antoine]

    Ok, updated patch:

    • uses a 4-bit rotate (not shift)
    • avoids comparing an unsigned long to -1
    • tries to streamline the win64 special path (but I can't test)

    pointer_hash4.patch looks fine to me. Still, I think it's worth
    considering the simpler and faster: x |= x>>4. The latter doesn't
    require any special-casing for various pointer sizes. It just works.

    [Adam]

    Adding an arbitrary set of OR, XOR, or add makes me uneasy;
    I know enough to do them wrong (reduce entropy), but not
    enough to do them right.

    It's easy enough to prove (just show that the function is reversible)
    and easy enough to test:

    assert len(set(ids)) == len(set(map(f, set(ids)))) # for any large
    group of ids

    @lambertdw
    Copy link
    Mannequin

    lambertdw mannequin commented Feb 12, 2009

    "x |= x>>4"

    Are you (Ray) sure you didn't mean

    "x ^= x>>4" ?

    @Rhamphoryncus
    Copy link
    Mannequin

    Rhamphoryncus mannequin commented Feb 12, 2009

    Testing with a large set of ids is a good demonstration, but not proof.
    Forming a set of *all* possible values within a certain range is proof.

    However, XOR does work (OR definitely does not) — it's a 1-to-1
    transformation (reversible as you say.)

    Additionally, it still gives the unnaturally low collision rate when
    using sequential addresses, so there's no objection there.

    @rhettinger
    Copy link
    Contributor

    David, yes, I did mean x ^= x>>4;
    How embarrassing.

    @mdickinson
    Copy link
    Member Author

    > - avoids comparing an unsigned long to -1

    Just out of interest, why? The cast is unnecessary: there's no ambiguity
    or undefinedness (the int -1 gets promoted to unsigned long, with
    wraparound semantics), and neither gcc nor MSVC complains.

    Other than that, the patch looks fine to me; x ^= x >> 4 would be fine
    too. I really don't see that it makes much difference either way, since
    both transformations are reversible and fast enough.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 12, 2009

    Mark:

    Just out of interest, why? The cast is unnecessary: there's no ambiguity
    or undefinedness (the int -1 gets promoted to unsigned long, with
    wraparound semantics), and neither gcc nor MSVC complains.

    Well, I had memories of a weird signed/unsigned problem (bpo-4935) and
    I wasn't sure whether it could raise its head in the present case or
    not.

    Raymond:

    The latter doesn't
    require any special-casing for various pointer sizes.

    The special casing is just there so as to make all pointer bits
    participate in the final hash (which is what the original implementation
    does). Otherwise we could just unconditionally cast to unsigned long.

    @mdickinson
    Copy link
    Member Author

    Well, I had memories of a weird signed/unsigned problem (bpo-4935) and
    I wasn't sure whether it could raise its head in the present case or
    not.

    I'm 99.9% sure that it's not a problem here. If it is a problem then it
    needs to be fixed in long_hash in longobject.c as well, which uses
    exactly the same code.

    The relevant section of the C99 standard is 6.3.1.8, para. 1 (try
    googling for 'n1256' if you don't have a copy of the standard). The
    only really nasty cases are those of the form unsigned_int +
    signed_long, or more generally,

    low_rank_unsigned_integer binary_op higher_rank_signed_integer

    where the type of the expression depends on the relative sizes (not just
    ranks) of the integer types, giving potential portability problems. And
    there are similar problems with the integer promotions (6.3.1.1, para. 2).

    I guess it comes down to personal taste, but my own preference is to
    leave out casts where the conversion they describe is already implied by
    the C language rules, adding them back in to silence compiler warnings
    if necessary. I find it reduces noise in the code and makes the
    important casts more visible, but chacun à son goût.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 12, 2009

    Other than that, the patch looks fine to me; x ^= x >> 4 would be fine
    too.

    I've just tried x ^= x >> 4 and the speedup is smaller on our
    microbenchmark (time_object_hash.py). I conjecture that trying to
    maintain the sequentiality of adresses may have beneficial cache
    locality effects. Should we care?

    @mdickinson
    Copy link
    Member Author

    +1 for checking in pointer_hash4.patch, provided Raymond approves.

    @rhettinger
    Copy link
    Contributor

    Consider it approved. Though I prefer you switch to x ^= x >> 4.

    @mdickinson
    Copy link
    Member Author

    Though I prefer you switch to x ^= x >> 4.

    Okay, how about this one? Short and sweet. No loss of information
    except when sizeof(void *) > sizeof(long) (unavoidable until someone
    finds a way to fit all 64-bit pointers into a 32-bit integer type...)

    @pitrou
    Copy link
    Member

    pitrou commented Feb 12, 2009

    Okay, how about this one?

    Apart from the misindentation (the file should use tabs not spaces),
    have you run the benchmark script with it?

    @Rhamphoryncus
    Copy link
    Mannequin

    Rhamphoryncus mannequin commented Feb 12, 2009

    Antoine, x ^= x>>4 has a higher collision rate than just a rotate.
    However, it's still lower than a statistically random hash.

    If you modify the benchmark to randomly discard 90% of its contents this
    should give you random addresses, reflecting a long-running program.

    Here's the results I got (I used shift, too lazy to rotate):
    XOR, sequential: 20.174627065692999
    XOR, random: 30.460708379770004
    shift, sequential: 19.148091554626003
    shift, random: 30.495631933229998
    original, sequential: 23.736469268799997
    original, random: 33.536177158379999

    Not massive, but still worth fixing the hash.

    @mdickinson
    Copy link
    Member Author

    Apart from the misindentation

    Apologies. My fault for editing Python files while at work, with a
    substandard editor configuration...

    have you run the benchmark script with it?

    I have now. See attached file for 3 sets of results (original, xor
    version, and rotate) on 64-bit Linux/Core 2 Duo.

    Summary: rotate is uniformly and significantly faster than xor; xor is
    uniformly and significantly faster than the unpatched version.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 13, 2009

    Ok, so the rotate version is really significantly faster (and, as Adam
    pointed out, it's also theoretically better).

    @mdickinson
    Copy link
    Member Author

    Antoine, please check in a patch of your choice. I think we've beaten
    this issue to death already. :-)

    @pitrou pitrou self-assigned this Feb 13, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Feb 13, 2009

    pointer_hash5_rotate.patch has been committed to trunk and py3k. Thanks!

    @pitrou pitrou closed this as completed Feb 13, 2009
    @rhettinger
    Copy link
    Contributor

    Sweet!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants