Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
include LICENSE
include README.rst
include cpuinfo.py
include include/*.h
include src/*.h
include src/*.cc
include src/*.cpp
include src/*.pyx
Expand Down
26 changes: 16 additions & 10 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,12 @@ MetroHash algorithm. For stateless hashing, it exports ``metrohash64`` and
Incremental hashing
~~~~~~~~~~~~~~~~~~~

For incremental hashing, use ``MetroHash64`` and ``MetroHash128`` classes.
Incremental hashing is associative and guarantees that any combination of input
slices will result in the same final hash value. This is useful for processing
large inputs and stream data. Example with two slices:
Unlike its cousins CityHash and FarmHash, MetroHash allows incremental
(stateful) hashing. For incremental hashing, use ``MetroHash64`` and
``MetroHash128`` classes. Incremental hashing is associative and guarantees
that any combination of input slices will result in the same final hash value.
This is useful for processing large inputs and stream data. Example with two
slices:

.. code-block:: python

Expand All @@ -85,10 +87,14 @@ Note that the resulting hash value above is the same as in:
Buffer protocol support
~~~~~~~~~~~~~~~~~~~~~~~

The methods in this module support Python `Buffer Protocol
<https://docs.python.org/3/c-api/buffer.html>`__, which allows them to be used
on any object that exports a buffer interface. Here is an example showing
hashing of a 4D NumPy array:
The Python `Buffer Protocol <https://docs.python.org/3/c-api/buffer.html>`__
allows Python objects to expose their data as raw byte arrays to other objects,
for fast access without copying to a separate location in memory. Among
others, NumPy is a major framework that supports this protocol.

All hashing functions in this packege will read byte arrays from objects that
expose them via the buffer protocol. Here is an example showing hashing of a 4D
NumPy array:

.. code-block:: python

Expand All @@ -97,8 +103,8 @@ hashing of a 4D NumPy array:
>>> metrohash.hash64_int(arr)
12125832280816116063

Note that arrays need to be contiguous for this to work. To convert a
non-contiguous array, use ``np.ascontiguousarray()`` method.
The arrays need to be contiguous for this to work. To convert a non-contiguous
array, use NumPy's ``ascontiguousarray()`` function.

Development
-----------
Expand Down
2 changes: 1 addition & 1 deletion cpp.mk
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ CXX := g++
CXXFLAGS := -std=c++11 -O3 -msse4.2
LDFLAGS :=
SRCEXT := cc
INC := -I include
INC := -I src
LIB := -L lib

INPUT := ./data/sample_100k.txt
Expand Down
4 changes: 2 additions & 2 deletions pip-freeze.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ ipdb==0.13.9
ipython==7.30.1
jedi==0.18.1
matplotlib-inline==0.1.3
-e git+https://github.com/escherba/python-metrohash@03ec8e4b0b21bf4a9726b3625d5ebc6e791b2a82#egg=metrohash
-e git+https://github.com/escherba/python-metrohash@cffe9bc1c0b48c2269c9e0abe1fb564ccf86d41f#egg=metrohash
numpy==1.21.5
packaging==21.3
parso==0.8.3
Expand All @@ -18,7 +18,7 @@ pluggy==1.0.0
prompt-toolkit==3.0.24
ptyprocess==0.7.0
py==1.11.0
Pygments==2.11.0
Pygments==2.11.1
pyparsing==3.0.6
pytest==6.2.5
toml==0.10.2
Expand Down
68 changes: 35 additions & 33 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@ def is_pure(self):

CXXFLAGS = []

print(f"building for platform: {os.name}")
print("building for platform: %s" % os.name)
print("available CPU flags: %s" % CPU_FLAGS)

if os.name == "nt":
CXXFLAGS.extend(["/O2"])
else:
Expand All @@ -42,58 +44,58 @@ def is_pure(self):
])


if 'ssse3' in CPU_FLAGS:
print("Compiling with SSSE3 enabled")
CXXFLAGS.append('-mssse3')
else:
print("compiling without SSE3 support")


if 'sse4_2' in CPU_FLAGS:
print("Compiling with SSE4.2 enabled")
CXXFLAGS.append('-msse4.2')
else:
print("compiling without SSE4.2 support")


INCLUDE_DIRS = ['include']
INCLUDE_DIRS = ['src']
CXXHEADERS = [
"include/metro.h",
"include/metrohash.h",
"include/metrohash128.h",
"include/metrohash128crc.h",
"include/metrohash64.h",
"include/platform.h",
"src/metro.h",
"src/metrohash.h",
"src/metrohash128.h",
"src/metrohash128crc.h",
"src/metrohash64.h",
"src/platform.h",
]
CXXSOURCES = [
"src/metrohash64.cc",
"src/metrohash128.cc",
]

CMDCLASS = {}
EXT_MODULES = []

if USE_CYTHON:
print("building extension using Cython")
CMDCLASS['build_ext'] = build_ext
EXT_MODULES.append(
Extension(
"metrohash",
CXXSOURCES + ["src/metrohash.pyx"],
depends=CXXHEADERS,
language="c++",
extra_compile_args=CXXFLAGS,
include_dirs=INCLUDE_DIRS,
)
)
CMDCLASS = {'build_ext': build_ext}
SRC_EXT = ".pyx"
else:
print("building extension w/o Cython")
EXT_MODULES.append(
Extension(
"metrohash",
CXXSOURCES + ["src/metrohash.cpp"],
depends=CXXHEADERS,
language="c++",
extra_compile_args=CXXFLAGS,
include_dirs=INCLUDE_DIRS,
)
)


VERSION = '0.1.1.post2'
CMDCLASS = {}
SRC_EXT = ".cpp"


EXT_MODULES = [
Extension(
"metrohash",
CXXSOURCES + ["src/metrohash" + SRC_EXT],
depends=CXXHEADERS,
language="c++",
extra_compile_args=CXXFLAGS,
include_dirs=INCLUDE_DIRS,
),
]

VERSION = '0.1.1.post3'
URL = "https://github.com/escherba/python-metrohash"


Expand Down
File renamed without changes.
Loading