Skip to content

Commit

Permalink
adding docs for math epsilon and group by
Browse files Browse the repository at this point in the history
  • Loading branch information
seperman committed Dec 31, 2020
1 parent a36282a commit a47c799
Show file tree
Hide file tree
Showing 5 changed files with 74 additions and 2 deletions.
44 changes: 44 additions & 0 deletions docs/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,5 +143,49 @@ Object attribute added:
You just need to set view='tree' to get it in tree form.


.. _group_by_label:

Group By
--------

group_by can be used when dealing with list of dictionaries to convert them to group them by value defined in group_by. The common use case is when reading data from a flat CSV and primary key is one of the columns in the CSV. We want to use the primary key to group the rows instead of CSV row number.

Example:
>>> from deepdiff import DeepDiff
>>> t1 = [
... {'id': 'AA', 'name': 'Joe', 'last_name': 'Nobody'},
... {'id': 'BB', 'name': 'James', 'last_name': 'Blue'},
... {'id': 'CC', 'name': 'Mike', 'last_name': 'Apple'},
... ]
>>>
>>> t2 = [
... {'id': 'AA', 'name': 'Joe', 'last_name': 'Nobody'},
... {'id': 'BB', 'name': 'James', 'last_name': 'Brown'},
... {'id': 'CC', 'name': 'Mike', 'last_name': 'Apple'},
... ]
>>>
>>> DeepDiff(t1, t2)
{'values_changed': {"root[1]['last_name']": {'new_value': 'Brown', 'old_value': 'Blue'}}}


Now we use group_by='id':
>>> DeepDiff(t1, t2, group_by='id')
{'values_changed': {"root['BB']['last_name']": {'new_value': 'Brown', 'old_value': 'Blue'}}}

.. note::
group_by actually changes the structure of the t1 and t2. You can see this by using the tree view:

>>> diff = DeepDiff(t1, t2, group_by='id', view='tree')
>>> diff
{'values_changed': [<root['BB']['last_name'] t1:'Blue', t2:'Brown'>]}
>>> diff['values_changed'][0]
<root['BB']['last_name'] t1:'Blue', t2:'Brown'>
>>> diff['values_changed'][0].up
<root['BB'] t1:{'name': 'Ja...}, t2:{'name': 'Ja...}>
>>> diff['values_changed'][0].up.up
<root t1:{'AA': {'nam...}, t2:{'AA': {'nam...}>
>>> diff['values_changed'][0].up.up.t1
{'AA': {'name': 'Joe', 'last_name': 'Nobody'}, 'BB': {'name': 'James', 'last_name': 'Blue'}, 'CC': {'name': 'Mike', 'last_name': 'Apple'}}


Back to :doc:`/index`
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@
'github_count': True,
'font_family': 'Open Sans',
'canonical_url': 'https://zepworks.com/deepdiff/current/',
'page_width': '1024px',
'body_max_width': '1024px',
}

# Add any paths that contain custom themes here, relative to this directory.
Expand Down
5 changes: 4 additions & 1 deletion docs/diff_doc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ get_deep_distance: Boolean, default = False
:ref:`get_deep_distance_label` will get you the deep distance between objects. The distance is a number between 0 and 1 where zero means there is no diff between the 2 objects and 1 means they are very different. Note that this number should only be used to compare the similarity of 2 objects and nothing more. The algorithm for calculating this number may or may not change in the future releases of DeepDiff.

group_by: String, default=None
:ref:`group_by` can be used when dealing with list of dictionaries to convert them to group them by value defined in group_by. The common use case is when reading data from a flat CSV and primary key is one of the columns in the CSV. We want to use the primary key to group the rows instead of CSV row number.
:ref:`group_by_label` can be used when dealing with list of dictionaries to convert them to group them by value defined in group_by. The common use case is when reading data from a flat CSV and primary key is one of the columns in the CSV. We want to use the primary key to group the rows instead of CSV row number.

hasher: default = DeepHash.murmur3_128bit
Hash function to be used. If you don't want Murmur3, you can use Python's built-in hash function
Expand Down Expand Up @@ -105,6 +105,9 @@ max_passes: Integer, default = 10000000
max_diffs: Integer, default = None
:ref:`max_diffs_label` defined the maximum number of diffs to run on objects to pin point what exactly is different. This is only used when ignore_order=True

math_epsilon: Decimal, default = None
:ref:`math_epsilon_label` uses Python's built in Math.isclose. It defines a tolerance value which is passed to math.isclose(). Any numbers that are within the tolerance will not report as being different. Any numbers outside of that tolerance will show up as different.

number_format_notation : string, default="f"
:ref:`number_format_notation_label` is what defines the meaning of significant digits. The default value of "f" means the digits AFTER the decimal point. "f" stands for fixed point. The other option is "e" which stands for exponent notation or scientific notation.

Expand Down
23 changes: 23 additions & 0 deletions docs/numbers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,29 @@ ignore_nan_inequality: Boolean, default = False
>>> DeepDiff(float('nan'), float('nan'), ignore_nan_inequality=True)
{}

.. _math_epsilon_label:

Math Epsilon
------------

math_epsilon: Decimal, default = None
math_epsilon uses Python's built in Math.isclose. It defines a tolerance value which is passed to math.isclose(). Any numbers that are within the tolerance will not report as being different. Any numbers outside of that tolerance will show up as different.

For example for some sensor data derived and computed values must lie in a certain range. It does not matter that they are off by e.g. 1e-5.

To check against that the math core module provides the valuable isclose() function. It evaluates the being close of two numbers to each other, with reference to an epsilon (abs_tol). This is superior to the format function, as it evaluates the mathematical representation and not the string representation.

Example:
>>> from decimal import Decimal
>>> d1 = {"a": Decimal("7.175")}
>>> d2 = {"a": Decimal("7.174")}
>>> DeepDiff(d1, d2, math_epsilon=0.01)
{}

.. note::
math_epsilon cannot currently handle the hashing of values, which is done when :ref:`ignore_order_label` is True.


Performance Improvement of Numbers diffing
------------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion tests/test_diff_math.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from decimal import Decimal
from deepdiff.diff import DeepDiff
from deepdiff import DeepDiff


class TestDiffMath:
Expand Down

0 comments on commit a47c799

Please sign in to comment.