Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a bug that showed when trying to establish equivalence of two a… #837

Closed
wants to merge 8 commits into from

Conversation

maciekswat
Copy link
Contributor

@maciekswat maciekswat commented Apr 27, 2016

…rrays that are numpy.recarrays

The following code WILL run with the fix but will crash when running current xarray:

import numpy as np
import xarray as xr

p_data = np.array([('John', 180), ('Stacy', 150), ('Dick',200)], dtype=[('name', '|S256'), ('height', int)])
p_data_1 = np.array([('John', 180), ('Stacy', 150), ('Dick',200)], dtype=[('name', '|S256'), ('height', int)])

weights_0 = xr.DataArray([80,56,120], dims=['participant'], coords={'participant':p_data})
weights_1 = xr.DataArray([81,52,115], dims=['participant'], coords={'participant':p_data_1})

print weights_1-weights_0

…rrays that are numpy.recarrays

The following code WILL run with the fix but will crash when running current xarray:

import numpy as np
import xarray as xr

p_data = np.array([('John', 180), ('Stacy', 150), ('Dick',200)], dtype=[('name', '|S256'), ('height', int)])
p_data_1 = np.array([('John', 180), ('Stacy', 150), ('Dick',200)], dtype=[('name', '|S256'), ('height', int)])

weights_0 = xr.DataArray([80,56,120], dims=['participant'], coords={'participant':p_data})
weights_1 = xr.DataArray([81,52,115], dims=['participant'], coords={'participant':p_data_1})

print weights_1-weights_0
@shoyer
Copy link
Member

shoyer commented Apr 28, 2016

Thanks for looking into this!

Just to summarize, the issue is that isnull doesn't work on structured dtypes:

>>> pd.isnull(p_data)
TypeError: Not implemented for this type

The problem with your current patch is that it will actually compare all array elements twice. Because checks for equality are done quite often in xarray, this will have prohibitive performance cost.

So, we'll either need to check for structured dtypes, or catch TypeError exceptions and try doing the comparison without checking for nulls.

Also, we need tests, to verify that the bug is fixed.

…rrays that are numpy.recarrays. Used try/except to catch exception that may arise when running isnull on a structured array. Added appropriate test to test_dataarray to check for correct handling of dimension that are structured arrays

The following code WILL run with the fix but will crash when running current xarray:

import numpy as np
import xarray as xr

p_data = np.array([('John', 180), ('Stacy', 150), ('Dick',200)], dtype=[('name', '|S256'), ('height', int)])
p_data_1 = np.array([('John', 180), ('Stacy', 150), ('Dick',200)], dtype=[('name', '|S256'), ('height', int)])

weights_0 = xr.DataArray([80,56,120], dims=['participant'], coords={'participant':p_data})
weights_1 = xr.DataArray([81,52,115], dims=['participant'], coords={'participant':p_data_1})

print weights_1-weights_0
@maciekswat
Copy link
Contributor Author

I have rewrote array_equiv fcn from ops.py to use single comparison (the one which tests from null is inside try/except clause). Also added test case to test_DataArray.py

@shoyer
Copy link
Member

shoyer commented May 2, 2016

@maciekswat awesome -- thanks!

can you please run your code through a PEP8 checker like pep8 or pyflakes? some of your spacing is off.

also, please add a note to "What's new" giving yourself credit for the bug fix :).


return bool(flag_array.all())

# return bool(((arr1 == arr2) | (isnull(arr1) & isnull(arr2))).all())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented out lines

@maciekswat
Copy link
Contributor Author

just fixed the formatting according to PEP8 guidelines

from . import (TestCase, ReturnItem, source_ndarray, unittest, requires_dask,
# from . import (TestCase, ReturnItem, source_ndarray, unittest, requires_dask,
# requires_bottleneck)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this commented out code.

@maciekswat
Copy link
Contributor Author

I simplified the test - please feel free to do further edits, also added doc string explaining what the test checks for - essentially it checks if DataArray subtraction behaves correctly when dimension is a structured numpy array.


flag_array |= (isnull(arr1) & isnull(arr2))

except TypeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a brief comment explaining why we allow TypeError here? a reference to this issue (e.g., GH837)` could be nice, too.

…snull on structured array that happens to be one of the dims of DataArray
@maciekswat
Copy link
Contributor Author

Added a comment explaining why we need to catch typeError in array_equiv

@jhamman
Copy link
Member

jhamman commented May 11, 2016

@maciekswat - this is almost there. @shoyer and I each have comments remaining that should be addressed and then we need a bug fix entry in "what's new". Thanks!

@maciekswat
Copy link
Contributor Author

Added bugfix entry in what's new

@shoyer
Copy link
Member

shoyer commented Jun 1, 2016

This looks good to me. The test could probably still be made more succinct but this covers the essentials. I like that it provides an integration test for using rec arrays.

@jhamman any more comments before I merge?

@shoyer
Copy link
Member

shoyer commented Jun 24, 2016

Merged via 0d0ae9d

@shoyer shoyer closed this Jun 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants