New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed a bug that showed when trying to establish equivalence of two a… #837
Conversation
…rrays that are numpy.recarrays The following code WILL run with the fix but will crash when running current xarray: import numpy as np import xarray as xr p_data = np.array([('John', 180), ('Stacy', 150), ('Dick',200)], dtype=[('name', '|S256'), ('height', int)]) p_data_1 = np.array([('John', 180), ('Stacy', 150), ('Dick',200)], dtype=[('name', '|S256'), ('height', int)]) weights_0 = xr.DataArray([80,56,120], dims=['participant'], coords={'participant':p_data}) weights_1 = xr.DataArray([81,52,115], dims=['participant'], coords={'participant':p_data_1}) print weights_1-weights_0
Thanks for looking into this! Just to summarize, the issue is that
The problem with your current patch is that it will actually compare all array elements twice. Because checks for equality are done quite often in xarray, this will have prohibitive performance cost. So, we'll either need to check for structured dtypes, or catch Also, we need tests, to verify that the bug is fixed. |
…rrays that are numpy.recarrays. Used try/except to catch exception that may arise when running isnull on a structured array. Added appropriate test to test_dataarray to check for correct handling of dimension that are structured arrays The following code WILL run with the fix but will crash when running current xarray: import numpy as np import xarray as xr p_data = np.array([('John', 180), ('Stacy', 150), ('Dick',200)], dtype=[('name', '|S256'), ('height', int)]) p_data_1 = np.array([('John', 180), ('Stacy', 150), ('Dick',200)], dtype=[('name', '|S256'), ('height', int)]) weights_0 = xr.DataArray([80,56,120], dims=['participant'], coords={'participant':p_data}) weights_1 = xr.DataArray([81,52,115], dims=['participant'], coords={'participant':p_data_1}) print weights_1-weights_0
I have rewrote array_equiv fcn from ops.py to use single comparison (the one which tests from null is inside try/except clause). Also added test case to test_DataArray.py |
@maciekswat awesome -- thanks! can you please run your code through a PEP8 checker like pep8 or pyflakes? some of your spacing is off. also, please add a note to "What's new" giving yourself credit for the bug fix :). |
|
||
return bool(flag_array.all()) | ||
|
||
# return bool(((arr1 == arr2) | (isnull(arr1) & isnull(arr2))).all()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented out lines
just fixed the formatting according to PEP8 guidelines |
from . import (TestCase, ReturnItem, source_ndarray, unittest, requires_dask, | ||
# from . import (TestCase, ReturnItem, source_ndarray, unittest, requires_dask, | ||
# requires_bottleneck) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this commented out code.
I simplified the test - please feel free to do further edits, also added doc string explaining what the test checks for - essentially it checks if DataArray subtraction behaves correctly when dimension is a structured numpy array. |
|
||
flag_array |= (isnull(arr1) & isnull(arr2)) | ||
|
||
except TypeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a brief comment explaining why we allow TypeError
here? a reference to this issue (e.g., GH837
)` could be nice, too.
…snull on structured array that happens to be one of the dims of DataArray
Added a comment explaining why we need to catch typeError in array_equiv |
@maciekswat - this is almost there. @shoyer and I each have comments remaining that should be addressed and then we need a bug fix entry in "what's new". Thanks! |
Added bugfix entry in what's new |
This looks good to me. The test could probably still be made more succinct but this covers the essentials. I like that it provides an integration test for using rec arrays. @jhamman any more comments before I merge? |
Merged via 0d0ae9d |
…rrays that are numpy.recarrays
The following code WILL run with the fix but will crash when running current xarray: