New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delta + Numpy structure throws ambiguous true value #194
Comments
@David-Herman Thanks for reporting the issue. This is happening because the shape of the 2 arrays are different. I'm fixing it now. |
thanks |
@David-Herman This is fixed in the dev branch now. I added your example as a test case. |
Hello, Thanks for the quick commit. I believe the code is causing a regression somewhere that is leading to poor performance (infinite loop?). I have two HDF5 (.Mat) files around 15-20 MB in size that unpack into python objects via scipy.io.loadmat(). I cannot provide the raw data as it is confidential. Please let me know if I can assist further in the debugging. When I install the prior dev commit (that fails due to the above issue) I get the following.
That returns
When I install the current head of dev I get the following results,
That increasingly uses up free memory. I killed it at 20 GB usage for the python process. I did confirm for the simple example when I interrupt the process I get,
|
Hi @David-Herman |
Also this is probably happening because we cache almost everything now in order to increase the performance. Sounds like your data basically makes it grow the cache size without enough cache hits to justify the cache. I'm gonna put some limits on the cache size. |
Hi @David-Herman |
Here are some of my results. Thanks for the help. Here is the install
without ignore order the memory usage balloons.
the memory size is reasonable but the speed is slow.
Here is the output
How do I set the chache_size?
edit: I uninstalled and then reinstalled from dev to get the last commit. I set the chache_size = 1 and still saw increasing use in memory. |
Ok, I have been troubleshooting further with sub-structures of my data structure. I am not encountering issues with memory size presently. What I have noticed is that when ignore_order=True is used in function call it runs slightly slower (e.g. 0.0119 vs 0.0139 seconds) for most of the data. Then on some data the ignore_order= True increase in time to 51 seconds compared to the DeepDiff(,) call of 0.02 seconds. This sub-structure tends to be numpy arrays (500-2000 elements) with dtypes like unit16 ('<u2'). I have tried to generate sample data via np.arange(). This ends up with 18.5 ms vs 53.1 ms difference with ignore_order=True. I used np.savetxt() to save two arrays that seem to be causing an issue. the difference is pretty clear.
compared to:
|
Hi @David-Herman |
Very interesting how long it takes to run this. Let's open a new ticket and continue there since this ticket was about the delta + numpy error. |
Describe the bug
I am using the delta of the diff to generate the other input again. When using numpy arrays as input (or dictionaries of lists of arrays, etc.) it throws an exception, ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
To Reproduce
Steps to reproduce the behavior:
Expected behavior
return of data structure matching numpy array a2
OS, DeepDiff version and Python version (please complete the following information):
The text was updated successfully, but these errors were encountered: