diff --git a/AUTHORS b/AUTHORS.md similarity index 100% rename from AUTHORS rename to AUTHORS.md diff --git a/README.md b/README.md index 59085626..727127c6 100644 --- a/README.md +++ b/README.md @@ -27,14 +27,6 @@ Tested on Python 3.6+ and PyPy3. `pip install deepdiff` -DeepDiff prefers to use Murmur3 for hashing. However you have to manually install Murmur3 by running: - -`pip install 'deepdiff[murmur]'` - -Otherwise DeepDiff will be using SHA256 for hashing which is a cryptographic hash and is considerably slower. - -If you are running into trouble installing Murmur3, please take a look at the [Troubleshoot](#troubleshoot) section. - ### Importing ```python @@ -399,23 +391,6 @@ And here is more info: -# Troubleshoot - -## Murmur3 - -`Failed to build mmh3 when installing DeepDiff` - -DeepDiff prefers to use Murmur3 for hashing. However you have to manually install murmur3 by running: `pip install mmh3` - -On MacOS Mojave some user experience difficulty when installing Murmur3. - -The problem can be solved by running: - -`xcode-select --install` - -And then running - -`pip install mmh3` # ChangeLog @@ -444,34 +419,4 @@ Thank you! # Authors -Authors listed in the order of the contributions: - -- [Sep Dehpour (Seperman)](http://www.zepworks.com) -- [Victor Hahn Castell](http://hahncastell.de) for the tree view and major contributions: -- [nfvs](https://github.com/nfvs) for Travis-CI setup script. -- [brbsix](https://github.com/brbsix) for initial Py3 porting. -- [WangFenjin](https://github.com/WangFenjin) for unicode support. -- [timoilya](https://github.com/timoilya) for comparing list of sets when ignoring order. -- [Bernhard10](https://github.com/Bernhard10) for significant digits comparison. -- [b-jazz](https://github.com/b-jazz) for PEP257 cleanup, Standardize on full names, fixing line endings. -- [finnhughes](https://github.com/finnhughes) for fixing __slots__ -- [moloney](https://github.com/moloney) for Unicode vs. Bytes default -- [serv-inc](https://github.com/serv-inc) for adding help(deepdiff) -- [movermeyer](https://github.com/movermeyer) for updating docs -- [maxrothman](https://github.com/maxrothman) for search in inherited class attributes -- [maxrothman](https://github.com/maxrothman) for search for types/objects -- [MartyHub](https://github.com/MartyHub) for exclude regex paths -- [sreecodeslayer](https://github.com/sreecodeslayer) for DeepSearch match_string -- Brian Maissy [brianmaissy](https://github.com/) for weakref fix, enum tests -- Bartosz Borowik [boba-2](https://github.com/boba-2) for Exclude types fix when ignoring order -- Brian Maissy [brianmaissy](https://github.com/brianmaissy) for fixing classes which inherit from classes with slots didn't have all of their slots compared -- Juan Soler [Soleronline](https://github.com/Soleronline) for adding ignore_type_number -- [mthaddon](https://github.com/mthaddon) for adding timedelta diffing support -- [Necrophagos](https://github.com/Necrophagos) for Hashing of the number 1 vs. True -- [gaal-dev](https://github.com/gaal-dev) for adding exclude_obj_callback -- Ivan Piskunov [van-ess0](https://github.com/van-ess0) for deprecation warning enhancement. -- Michał Karaś [MKaras93](https://github.com/MKaras93) for the pretty view -- Christian Kothe [chkothe](https://github.com/chkothe) for the basic support for diffing numpy arrays -- [Timothy](https://github.com/timson) for truncate_datetime -- [d0b3rm4n](https://github.com/d0b3rm4n) for bugfix to not apply format to non numbers. -- [MyrikLD](https://github.com/MyrikLD) for Bug Fix NoneType in ignore type groups +Please take a look at the [AUTHORS](AUTHORS.md) file. diff --git a/deepdiff/deephash.py b/deepdiff/deephash.py index 3f562732..4a9445fa 100644 --- a/deepdiff/deephash.py +++ b/deepdiff/deephash.py @@ -14,7 +14,6 @@ logger = logging.getLogger(__name__) UNPROCESSED_KEY = 'unprocessed' -MURMUR_SEED = 1203 RESERVED_DICT_KEYS = {UNPROCESSED_KEY} EMPTY_FROZENSET = frozenset() @@ -47,6 +46,7 @@ def combine_hashes_lists(items, prefix): Combines lists of hashes into one hash This can be optimized in future. It needs to work with both murmur3 hashes (int) and sha256 (str) + Although murmur3 is not used anymore. """ if isinstance(prefix, bytes): prefix = prefix.decode('utf-8') diff --git a/docs/deephash_doc.rst b/docs/deephash_doc.rst index ada44617..b90d5f28 100644 --- a/docs/deephash_doc.rst +++ b/docs/deephash_doc.rst @@ -7,10 +7,7 @@ The main usage of DeepHash is to calculate the hash of otherwise unhashable obje For example you can use DeepHash to calculate the hash of a set or a dictionary! At the core of it, DeepHash is a deterministic serialization of your object into a string so it -can be passed to a hash function. By default it uses Murmur 3 128 bit hash function which is a -fast, non-cryptographic hashing function. You have the option to pass any another hashing function to be used instead. - -If it can't find Murmur3 package (mmh3) installed, it uses Python's built-in SHA256 for hashing which is considerably slower than Murmur3. So it is advised that you install Murmur3 by running `pip install 'deepdiff[murmur]` +can be passed to a hash function. By default it uses SHA256. You have the option to pass any another hashing function to be used instead. **Import** >>> from deepdiff import DeepHash @@ -39,20 +36,21 @@ exclude_obj_callback A function that takes the object and its path and returns a Boolean. If True is returned, the object is excluded from the results, otherwise it is included. This is to give the user a higher level of control than one can achieve via exclude_paths, exclude_regex_paths or other means. -hasher: function. default = DeepHash.murmur3_128bit - hasher is the hashing function. The default is DeepHash.murmur3_128bit. +hasher: function. default = DeepHash.sha256hex + hasher is the hashing function. The default is DeepHash.sha256hex. But you can pass another hash function to it if you want. For example a cryptographic hash function or Python's builtin hash function. All it needs is a function that takes the input in string format and returns the hash. You can use it by passing: hasher=hash for Python's builtin hash. - The following alternatives are already provided: + The following alternative is already provided: - - hasher=DeepHash.murmur3_128bit - - hasher=DeepHash.murmur3_64bit - hasher=DeepHash.sha1hex + Note that prior to DeepDiff 5.2, Murmur3 was the default hash function. + But Murmur3 is removed from DeepDiff dependencies since then. + ignore_repetition: Boolean, default = True If repetitions in an iterable should cause the hash of iterable to be different. Note that the deepdiff diffing functionality lets this to be the default at all times. @@ -165,10 +163,9 @@ But with DeepHash: At first it might seem weird why DeepHash(obj)[obj] but remember that DeepHash(obj) is a dictionary of hashes of all other objects that obj contains too. - The result hash is 34150898645750099477987229399128149852 which is generated by - Murmur 3 128bit hashing algorithm. If you prefer to use another hashing algorithm, you can pass it using the hasher parameter. Read more about Murmur3 here: https://en.wikipedia.org/wiki/MurmurHash + The result hash is 34150898645750099477987229399128149852. If you prefer to use another hashing algorithm, you can pass it using the hasher parameter. - If you do a deep copy of obj, it should still give you the same hash: + If you do a deep copy of the obj, it should still give you the same hash: >>> from copy import deepcopy >>> obj2 = deepcopy(obj) diff --git a/docs/diff_doc.rst b/docs/diff_doc.rst index 72153e47..a6a13316 100644 --- a/docs/diff_doc.rst +++ b/docs/diff_doc.rst @@ -57,8 +57,8 @@ get_deep_distance: Boolean, default = False group_by: String, default=None :ref:`group_by_label` can be used when dealing with list of dictionaries to convert them to group them by value defined in group_by. The common use case is when reading data from a flat CSV and primary key is one of the columns in the CSV. We want to use the primary key to group the rows instead of CSV row number. -hasher: default = DeepHash.murmur3_128bit - Hash function to be used. If you don't want Murmur3, you can use Python's built-in hash function +hasher: default = DeepHash.sha256hex + Hash function to be used. If you don't want SHA256, you can use your own hash function by passing hasher=hash. This is for advanced usage and normally you don't need to modify it. ignore_order : Boolean, default=False diff --git a/docs/index.rst b/docs/index.rst index 398b0ed9..78ed05de 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -49,14 +49,7 @@ Install from PyPi:: pip install deepdiff -DeepDiff prefers to use Murmur3 for hashing. However you need to manually install Murmur3 by running:: - - pip install 'deepdiff[murmur]' - -Otherwise DeepDiff will be using SHA256 for hashing which is a cryptographic hash and is considerably slower for hashing. -However hashing is not usually the bottleneck when dealing with big objects. Read more about DeepDiff :ref:`optimizations_label` - -If you are running into trouble installing Murmur3, please take a look at the :ref:`troubleshoot_label` section. +Read about DeepDiff optimizations at :ref:`optimizations_label` Importing diff --git a/docs/troubleshoot.rst b/docs/troubleshoot.rst index 4eb86017..f3ae34a8 100644 --- a/docs/troubleshoot.rst +++ b/docs/troubleshoot.rst @@ -8,6 +8,10 @@ Troubleshoot Murmur3 Installation ~~~~~~~~~~~~~~~~~~~~ +NOTE: Murmur3 was removed from DeepDiff 5.2.0 + +If you are running into this issue, you are using an older version of DeepDiff. + `Failed to build mmh3 when installing DeepDiff` DeepDiff prefers to use Murmur3 for hashing. However you have to manually install murmur3 by running: `pip install mmh3`