-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3703 from stuartarchibald/wip/hash_impls
Implementations of type hashing.
- Loading branch information
Showing
17 changed files
with
1,068 additions
and
161 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
|
||
================ | ||
Notes on Hashing | ||
================ | ||
|
||
Numba supports the built-in :func:`hash` and does so by simply calling the | ||
:func:`__hash__` member function on the supplied argument. This makes it | ||
trivial to add hash support for new types as all that is required is the | ||
application of the extension API :func:`overload_method` decorator to overload | ||
a function for computing the hash value for the new type registered to the | ||
type's :func:`__hash__` method. For example:: | ||
|
||
from numba.extending import overload_method | ||
|
||
@overload_method(myType, '__hash__') | ||
def myType_hash_overload(obj): | ||
# implementation details | ||
|
||
|
||
The Implementation | ||
================== | ||
|
||
The implementation of the Numba hashing functions strictly follows that of | ||
Python 3. The only exception to this is that for hashing Unicode and bytes (for | ||
content longer than ``sys.hash_info.cutoff``) the only supported algorithm is | ||
``siphash24`` (default in CPython 3). As a result Numba will match Python 3 | ||
hash values for all supported types under the default conditions described. | ||
Python 2 hashing support is set up to follow Python 3 and similar defaults are | ||
hard coded for this purpose, including, perhaps most noticeably, | ||
``sys.hash_info.cutoff`` is set to zero. | ||
|
||
Unicode hash cache differences | ||
------------------------------ | ||
|
||
Both Numba and CPython Unicode string internal representations have a ``hash`` | ||
member for the purposes of caching the string's hash value. This member is | ||
always checked ahead of computing a hash value the with view of simply providing | ||
a value from cache as it is considerably cheaper to do so. The Numba Unicode | ||
string hash caching implementation behaves in a similar way to that of | ||
CPython's. The only notable behavioral change (and its only impact is a minor | ||
potential change in performance) is that Numba always computes and caches the | ||
hash for Unicode strings created in ``nopython mode`` at the time they are boxed | ||
for reuse in Python, this is too eager in some cases in comparison to CPython | ||
which may delay hashing a new Unicode string depending on creation method. It | ||
should also be noted that Numba copies in the ``hash`` member of the CPython | ||
internal representation for Unicode strings when unboxing them to its own | ||
representation so as to not recompute the hash of a string that already has a | ||
hash value associated with it. | ||
|
||
The accommodation of ``PYTHONHASHSEED`` | ||
--------------------------------------- | ||
|
||
The ``PYTHONHASHSEED`` environment variable can be used to seed the CPython | ||
hashing algorithms for e.g. the purposes of reproduciblity. The Numba hashing | ||
implementation directly reads the CPython hashing algorithms' internal state and | ||
as a result the influence of ``PYTHONHASHSEED`` is replicated in Numba's | ||
hashing implementations. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,4 +18,5 @@ Developer Manual | |
stencil.rst | ||
custom_pipeline.rst | ||
environment.rst | ||
hashing.rst | ||
roadmap.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.