New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepDiff to run in multiple passes to diff combinations of results when ignore_order=True #136
Comments
BTW. It may NOT be just two passes but this needs to be addressed at the multiple level hierarchy. |
@seperman do you need some "real" test-data to play around with? |
Hi @testautomation |
tagging @nkaliape too. |
@seperman first thing I noticed is
solved by May be numpy should be added as dependency so that it is installed automatically when doing |
looks definitely better than before (where I just had a big "iterable added" + another big "iterable removed") Now the diff looks much better:
data and settings used
# exclude_obj_callback function
def ignore_type_properties(obj, path):
ignorable_types = [
"ARCHETYPE_ID",
"ARCHETYPED",
"CODE_PHRASE",
"DV_BOOLEAN",
"DV_CODED_TEXT",
"DV_COUNT",
"DV_DATE",
"DV_DATE_TIME",
"DV_DATE_TIME",
"DV_DURATION",
"DV_EHR_URI",
"DV_IDENTIFIER",
"DV_MULTIMEDIA",
"DV_ORDINAL",
"DV_PARSABLE",
"DV_PROPORTION",
"DV_QUANTITY",
"DV_SCALE",
"DV_STATE",
"DV_TEXT",
"DV_TIME",
"DV_URI",
"REFERENCE_RANGE",
"TEMPLATE_ID",
"TERM_MAPPING",
"TERMINOLOGY_ID",
]
return True if "_type" in path and obj in ignorable_types else False |
@seperman I may have found an issue, not 100% sure yet but seems like my tests edit: test just finished (successfully) after 1 h and 8 minutes 😱 I'll try to figure out which data-set causes the slow down. |
here are the test results: test_report_and_log.zip but don't care much about them yet. I'll have to repeat the whole procedure bc my VM could have been the root cause - it was running out of disk space 🙈 |
Repeated test w/ more RAM and disk space. Same result. But I think the slow down is reasonable cause there is much more going on under the hood now. So it's not an issue of the changes in v5 but simple due to the fact that the diff is HUGE - btw. Robot's XML file (which is written during test execution) grew to over 1 GB. Also the way I wrapped Deepdiff in Robot and the logging that I do may be a reason. I'll extract the generated test-data from my test log to make it easier to test with Deepdiff directly (w/o other parts like Robot or "writing to XML" involved). @seperman Is there a way to abort the comparison when let's say "enough" diffs where recognized? Something like a diff limit? |
Thanks for the very useful information.
Deepdiff is running recursively now between any diffs it finds to see if it can pin point the actual difference. so it is way slower than before.
Numpy shouldn’t be required. Thanks for reporting it.
I will look into some optimizations.
We can have add a parameter for max passes to run. Currently that is the max recursion depth allowed.
Have you installed murmur3? It is in the docs. It should increase the CPU usage but dramatically decrease the memory usage.
Also have you tried using pypy3 to run the diff? Im curious if you will gain any speed.
I will keep you posted once I do some tests.
Thanks
Sep Dehpour
… On Apr 30, 2020, at 7:30 AM, Wlad Wagner ***@***.***> wrote:
Repeated test w/ more RAM and disk space. Same result. But I think the slow down is reasonable cause there is much more going on under the hood now. So it's not an issue of the changes in v5 but simple due to the fact that the diff is HUGE - btw. Robot's XML file (which is written during test execution) grew to over 1 GB. Also the way I wrapped Deepdiff in Robot and the logging that I do may be a reason.
I'll extract the generated test-data from my test log to make it easier to test with Deepdiff directly (w/o other parts like Robot or "writing to XML" involved).
@seperman Is there a way to abort the comparison when let's say "enough" diffs where recognized? Something like a diff limit?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Here is the part I identified to have the most impact in my tests I've extracted relevant data from that. Here as .txt for a quick look huge_actual.txt An here as .json The latter one you can quickly take into use after extracting into a folder and then import json
from deepdiff import DeepDiff
actual = json.load(open('huge_actual.json'))
expected = json.load(open('huge_expected.json'))
# this is fast
diff = DeepDiff(actual, expected)
# gets dramatically slower w/ ignore_order
diff = DeepDiff(actual, expected, ignore_order=True) |
That would be really great!
Not yet. I can give it a shot but I have to make sure that it also works on CI. Same w/ pypy3. I'll let you know if I can report something interesting about that. Cheers |
DeepDiff 5 is finally here and it comes with multiple passes option! |
DeepDiff to run in 2 passes. And diff combinations of results when ignore_order=True.
Example:
Currently:
But if deepdiff compares the items between the iterable item added and removed, it should be spitting out the following results instead:
The text was updated successfully, but these errors were encountered: