Skip to content

Commit

Permalink
removing references to murmur3
Browse files Browse the repository at this point in the history
  • Loading branch information
seperman committed Jan 1, 2021
1 parent d04af64 commit 7f70159
Show file tree
Hide file tree
Showing 7 changed files with 18 additions and 79 deletions.
File renamed without changes.
57 changes: 1 addition & 56 deletions README.md
Expand Up @@ -27,14 +27,6 @@ Tested on Python 3.6+ and PyPy3.

`pip install deepdiff`

DeepDiff prefers to use Murmur3 for hashing. However you have to manually install Murmur3 by running:

`pip install 'deepdiff[murmur]'`

Otherwise DeepDiff will be using SHA256 for hashing which is a cryptographic hash and is considerably slower.

If you are running into trouble installing Murmur3, please take a look at the [Troubleshoot](#troubleshoot) section.

### Importing

```python
Expand Down Expand Up @@ -399,23 +391,6 @@ And here is more info: <http://zepworks.com/blog/diff-it-to-digg-it/>

<http://deepdiff.readthedocs.io/en/latest/>

# Troubleshoot

## Murmur3

`Failed to build mmh3 when installing DeepDiff`

DeepDiff prefers to use Murmur3 for hashing. However you have to manually install murmur3 by running: `pip install mmh3`

On MacOS Mojave some user experience difficulty when installing Murmur3.

The problem can be solved by running:

`xcode-select --install`

And then running

`pip install mmh3`

# ChangeLog

Expand Down Expand Up @@ -444,34 +419,4 @@ Thank you!

# Authors

Authors listed in the order of the contributions:

- [Sep Dehpour (Seperman)](http://www.zepworks.com)
- [Victor Hahn Castell](http://hahncastell.de) for the tree view and major contributions:
- [nfvs](https://github.com/nfvs) for Travis-CI setup script.
- [brbsix](https://github.com/brbsix) for initial Py3 porting.
- [WangFenjin](https://github.com/WangFenjin) for unicode support.
- [timoilya](https://github.com/timoilya) for comparing list of sets when ignoring order.
- [Bernhard10](https://github.com/Bernhard10) for significant digits comparison.
- [b-jazz](https://github.com/b-jazz) for PEP257 cleanup, Standardize on full names, fixing line endings.
- [finnhughes](https://github.com/finnhughes) for fixing __slots__
- [moloney](https://github.com/moloney) for Unicode vs. Bytes default
- [serv-inc](https://github.com/serv-inc) for adding help(deepdiff)
- [movermeyer](https://github.com/movermeyer) for updating docs
- [maxrothman](https://github.com/maxrothman) for search in inherited class attributes
- [maxrothman](https://github.com/maxrothman) for search for types/objects
- [MartyHub](https://github.com/MartyHub) for exclude regex paths
- [sreecodeslayer](https://github.com/sreecodeslayer) for DeepSearch match_string
- Brian Maissy [brianmaissy](https://github.com/) for weakref fix, enum tests
- Bartosz Borowik [boba-2](https://github.com/boba-2) for Exclude types fix when ignoring order
- Brian Maissy [brianmaissy](https://github.com/brianmaissy) for fixing classes which inherit from classes with slots didn't have all of their slots compared
- Juan Soler [Soleronline](https://github.com/Soleronline) for adding ignore_type_number
- [mthaddon](https://github.com/mthaddon) for adding timedelta diffing support
- [Necrophagos](https://github.com/Necrophagos) for Hashing of the number 1 vs. True
- [gaal-dev](https://github.com/gaal-dev) for adding exclude_obj_callback
- Ivan Piskunov [van-ess0](https://github.com/van-ess0) for deprecation warning enhancement.
- Michał Karaś [MKaras93](https://github.com/MKaras93) for the pretty view
- Christian Kothe [chkothe](https://github.com/chkothe) for the basic support for diffing numpy arrays
- [Timothy](https://github.com/timson) for truncate_datetime
- [d0b3rm4n](https://github.com/d0b3rm4n) for bugfix to not apply format to non numbers.
- [MyrikLD](https://github.com/MyrikLD) for Bug Fix NoneType in ignore type groups
Please take a look at the [AUTHORS](AUTHORS.md) file.
2 changes: 1 addition & 1 deletion deepdiff/deephash.py
Expand Up @@ -14,7 +14,6 @@
logger = logging.getLogger(__name__)

UNPROCESSED_KEY = 'unprocessed'
MURMUR_SEED = 1203

RESERVED_DICT_KEYS = {UNPROCESSED_KEY}
EMPTY_FROZENSET = frozenset()
Expand Down Expand Up @@ -47,6 +46,7 @@ def combine_hashes_lists(items, prefix):
Combines lists of hashes into one hash
This can be optimized in future.
It needs to work with both murmur3 hashes (int) and sha256 (str)
Although murmur3 is not used anymore.
"""
if isinstance(prefix, bytes):
prefix = prefix.decode('utf-8')
Expand Down
21 changes: 9 additions & 12 deletions docs/deephash_doc.rst
Expand Up @@ -7,10 +7,7 @@ The main usage of DeepHash is to calculate the hash of otherwise unhashable obje
For example you can use DeepHash to calculate the hash of a set or a dictionary!

At the core of it, DeepHash is a deterministic serialization of your object into a string so it
can be passed to a hash function. By default it uses Murmur 3 128 bit hash function which is a
fast, non-cryptographic hashing function. You have the option to pass any another hashing function to be used instead.

If it can't find Murmur3 package (mmh3) installed, it uses Python's built-in SHA256 for hashing which is considerably slower than Murmur3. So it is advised that you install Murmur3 by running `pip install 'deepdiff[murmur]`
can be passed to a hash function. By default it uses SHA256. You have the option to pass any another hashing function to be used instead.

**Import**
>>> from deepdiff import DeepHash
Expand Down Expand Up @@ -39,20 +36,21 @@ exclude_obj_callback
A function that takes the object and its path and returns a Boolean. If True is returned, the object is excluded from the results, otherwise it is included.
This is to give the user a higher level of control than one can achieve via exclude_paths, exclude_regex_paths or other means.

hasher: function. default = DeepHash.murmur3_128bit
hasher is the hashing function. The default is DeepHash.murmur3_128bit.
hasher: function. default = DeepHash.sha256hex
hasher is the hashing function. The default is DeepHash.sha256hex.
But you can pass another hash function to it if you want.
For example a cryptographic hash function or Python's builtin hash function.
All it needs is a function that takes the input in string format and returns the hash.

You can use it by passing: hasher=hash for Python's builtin hash.

The following alternatives are already provided:
The following alternative is already provided:

- hasher=DeepHash.murmur3_128bit
- hasher=DeepHash.murmur3_64bit
- hasher=DeepHash.sha1hex

Note that prior to DeepDiff 5.2, Murmur3 was the default hash function.
But Murmur3 is removed from DeepDiff dependencies since then.

ignore_repetition: Boolean, default = True
If repetitions in an iterable should cause the hash of iterable to be different.
Note that the deepdiff diffing functionality lets this to be the default at all times.
Expand Down Expand Up @@ -165,10 +163,9 @@ But with DeepHash:

At first it might seem weird why DeepHash(obj)[obj] but remember that DeepHash(obj) is a dictionary of hashes of all other objects that obj contains too.

The result hash is 34150898645750099477987229399128149852 which is generated by
Murmur 3 128bit hashing algorithm. If you prefer to use another hashing algorithm, you can pass it using the hasher parameter. Read more about Murmur3 here: https://en.wikipedia.org/wiki/MurmurHash
The result hash is 34150898645750099477987229399128149852. If you prefer to use another hashing algorithm, you can pass it using the hasher parameter.

If you do a deep copy of obj, it should still give you the same hash:
If you do a deep copy of the obj, it should still give you the same hash:

>>> from copy import deepcopy
>>> obj2 = deepcopy(obj)
Expand Down
4 changes: 2 additions & 2 deletions docs/diff_doc.rst
Expand Up @@ -57,8 +57,8 @@ get_deep_distance: Boolean, default = False
group_by: String, default=None
:ref:`group_by_label` can be used when dealing with list of dictionaries to convert them to group them by value defined in group_by. The common use case is when reading data from a flat CSV and primary key is one of the columns in the CSV. We want to use the primary key to group the rows instead of CSV row number.

hasher: default = DeepHash.murmur3_128bit
Hash function to be used. If you don't want Murmur3, you can use Python's built-in hash function
hasher: default = DeepHash.sha256hex
Hash function to be used. If you don't want SHA256, you can use your own hash function
by passing hasher=hash. This is for advanced usage and normally you don't need to modify it.

ignore_order : Boolean, default=False
Expand Down
9 changes: 1 addition & 8 deletions docs/index.rst
Expand Up @@ -49,14 +49,7 @@ Install from PyPi::

pip install deepdiff

DeepDiff prefers to use Murmur3 for hashing. However you need to manually install Murmur3 by running::

pip install 'deepdiff[murmur]'

Otherwise DeepDiff will be using SHA256 for hashing which is a cryptographic hash and is considerably slower for hashing.
However hashing is not usually the bottleneck when dealing with big objects. Read more about DeepDiff :ref:`optimizations_label`

If you are running into trouble installing Murmur3, please take a look at the :ref:`troubleshoot_label` section.
Read about DeepDiff optimizations at :ref:`optimizations_label`


Importing
Expand Down
4 changes: 4 additions & 0 deletions docs/troubleshoot.rst
Expand Up @@ -8,6 +8,10 @@ Troubleshoot
Murmur3 Installation
~~~~~~~~~~~~~~~~~~~~

NOTE: Murmur3 was removed from DeepDiff 5.2.0

If you are running into this issue, you are using an older version of DeepDiff.

`Failed to build mmh3 when installing DeepDiff`

DeepDiff prefers to use Murmur3 for hashing. However you have to manually install murmur3 by running: `pip install mmh3`
Expand Down

0 comments on commit 7f70159

Please sign in to comment.