Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fill_value in shift #2470

Merged
merged 17 commits into from Dec 27, 2018
Merged

fill_value in shift #2470

merged 17 commits into from Dec 27, 2018

Conversation

max-sixty
Copy link
Collaborator

@max-sixty max-sixty commented Oct 6, 2018

  • Closes #Shift changes non-float arrays to object, even for shift=0 #2451
  • Tests added
  • Tests passed
  • Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

Should we be more defensive around which fill_values can be passed? Currently, if the array and float values have incompatible dtypes, we don't preemtively warn or cast, apart from the case of np.nan, which then uses the default filler

@pep8speaks
Copy link

pep8speaks commented Oct 6, 2018

Hello @max-sixty! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 27, 2018 at 22:49 Hours UTC

ds = self._to_temp_dataset().shift(shifts=shifts, **shifts_kwargs)
return self._from_temp_dataset(ds)
return self._replace(variable=self.variable.shift(
shifts=shifts, fill_value=fill_value, **shifts_kwargs))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a reasonable change for DataArray operations that only change their data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is 100% equivalent -- I don't really care either way. But perhaps splitting this into two statements would be clearer.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- I have only a few minor points.

ds = self._to_temp_dataset().shift(shifts=shifts, **shifts_kwargs)
return self._from_temp_dataset(ds)
return self._replace(variable=self.variable.shift(
shifts=shifts, fill_value=fill_value, **shifts_kwargs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is 100% equivalent -- I don't really care either way. But perhaps splitting this into two statements would be clearer.

xarray/core/dataset.py Outdated Show resolved Hide resolved
@@ -940,7 +940,11 @@ def _shift_one_dim(self, dim, count):
keep = slice(None)

trimmed_data = self[(slice(None),) * axis + (keep,)].data
dtype, fill_value = dtypes.maybe_promote(self.dtype)

if fill_value is None or fill_value is np.nan:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use dtypes.NA as the default value for fill_value, and then copy these lines from pad to ensure that this works for arbitrary fill_values such as None:
https://github.com/pydata/xarray/blob/master/xarray/core/variable.py#L1012-L1015

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that, but it means that if someone passes np.nan (the value, not the dtypes.NA default) to a int dtype, I get -9223372036854775808. Should we raise in that case? Or I'll see what I can do to coerce.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm pretty sure this would qualify as a NumPy bug :(.

I'm mostly concerned with duplicate logic that works slightly differently, so if you want to try doing this differently I'm open to it. It might actually make sense to replace most of this method to a call to Variable.pad_with_fill_value().

Copy link
Collaborator Author

@max-sixty max-sixty Oct 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly concerned with duplicate logic that works slightly differently

💯

Great, I'll have a look and report back

@dcherian dcherian mentioned this pull request Oct 24, 2018
5 tasks
if fill_value is dtypes.NA and True:
dtype, fill_value = dtypes.maybe_promote(self.dtype)
else:
dtype = self.dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if filler is not compatible with self.dtype?
For example, feeding np.nan to an int array.
Probably it is a part of user responsibility and we do not need to take care of this, but I am just curious of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, NumPy should raise an error... But it may not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is the issue I'm looking at ref #2470 (comment)), good foresight @fujiisoup !

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (except for the redundant clause in the if condition)

xarray/core/variable.py Outdated Show resolved Hide resolved
if fill_value is dtypes.NA and True:
dtype, fill_value = dtypes.maybe_promote(self.dtype)
else:
dtype = self.dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, NumPy should raise an error... But it may not.

@max-sixty
Copy link
Collaborator Author

I think this is in a decent place.

If np.nan is passed as fill_value into an int array, we'll get an odd result. But if they leave it to default, it'll work correctly, and I couldn't find a robust general way of solving that

Let me know thoughts!
Max

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (but the what's new note is now in the wrong place)

doc/whats-new.rst Outdated Show resolved Hide resolved
doc/whats-new.rst Outdated Show resolved Hide resolved
@shoyer shoyer merged commit 85ded91 into pydata:master Dec 27, 2018
@max-sixty max-sixty deleted the shift branch December 28, 2018 01:07
dcherian pushed a commit to yohai/xarray that referenced this pull request Jan 2, 2019
* master:
  DEP: drop python 2 support and associated ci mods (pydata#2637)
  TST: silence warnings from bottleneck (pydata#2638)
  revert to dev version
  DOC: fix docstrings and doc build for 0.11.1
  Source encoding always set when opening datasets (pydata#2626)
  Add flake check to travis (pydata#2632)
  Fix dayofweek and dayofyear attributes from dates generated by cftime_range (pydata#2633)
  silence import warning (pydata#2635)
  fill_value in shift (pydata#2470)
  Flake fixed (pydata#2629)
  Allow passing of positional arguments in `apply` for Groupby objects (pydata#2413)
  Fix failure in time encoding for pandas < 0.21.1 (pydata#2630)
  Fix multiindex selection (pydata#2621)
  Close files when CachingFileManager is garbage collected (pydata#2595)
  added some logic to deal with rasterio objects in addition to filepaths (pydata#2589)
  Get 0d slices of ndarrays directly from indexing (pydata#2625)
  FIX Don't raise a deprecation warning for xarray.ufuncs.{angle,iscomplex} (pydata#2615)
  CF: also decode time bounds when available (pydata#2571)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants