Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: finite difference method diff #495

Merged
merged 1 commit into from Aug 21, 2015
Merged

ENH: finite difference method diff #495

merged 1 commit into from Aug 21, 2015

Conversation

andreas-h
Copy link
Contributor

adds diff method to both DataArray and Dataset. closes #490

Still to be done:

  • tests

Possible enhancements:

  • allow numeric axis instead of dim for DataArray objects. The signature would then be DataArray.diff(dim=None, n=1, axis=None) and I would need to check that exactly one of dim and axis is None. I find it a bit ugly; DataArray.diff(n=1, dim=None, axis=None) would be nicer in my view. But then the signatures for DataArray.diff and Dataset.diff would be different, which is also not nice.
  • allow specifying the new coordinate array explicitly via a coord kwarg instead of just taking the coordinate values of the upper bounds.

What do you think?

@andreas-h
Copy link
Contributor Author

Actually, the current implementation leads to 0.0 for arrays inside a Dataset which don't have dim as a dimension. I personally would find it more intuitive if those arrays would not be touched at all.

Do you agree, or do you prefer this 0.0 for unaffected arrays?

@shoyer
Copy link
Member

shoyer commented Jul 27, 2015

This looks very nice -- thanks for putting together the PR!

allow numeric axis instead of dim for DataArray objects.

I wouldn't bother with this. It's not so elegant, and dim already takes care of all the desired functionality.

allow specifying the new coordinate array explicitly via a coord kwarg instead of just taking the coordinate values of the upper bounds.

I wouldn't allow full control here, but maybe a keyword argument for choosing whether to take the "lower", "upper" or "mean" labels would be appropriate.

Actually, the current implementation leads to 0.0 for arrays inside a Dataset which don't have dim as a dimension. I personally would find it more intuitive if those arrays would not be touched at all.

I agree. It would be better to skip those variables entirely, like the current behavior for aggregation functions like mean.

return self
if n < 0:
raise ValueError('order `n` must be non-negative but got {}'
''.format((n)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one extra pair of parenthesis around n

@andreas-h
Copy link
Contributor Author

I wouldn't allow full control here, but maybe a keyword argument for
choosing whether to take the "lower", "upper" or "mean" labels would be
appropriate.

I don't see how 'mean' could work, as the coord might be of dtype str.
But I'm implementing 'upper' and 'lower'. If the user wants something
different, she can always just swap the coord manually.

@andreas-h
Copy link
Contributor Author

I'm not really happy with my implementation; the coordinate/variable handling in lines 1870-1884 are far from elegant. But is there a nicer way to do this?

if dim in var.dims or not var.dims:
if name not in self.coords:
variables[name] = end[name] - start[name]
variables[dim] = variables[name].coords[dim]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try:

kwargs_new = kwargs_end if coord_new == 'upper' else kwargs_start

for name, var in iteritems(self.variables):
    if dim in var.dims:
        if name in self.data_vars:
            variables[name] = var.isel(**kwargs_end) - var.isel(**kwargs_start)
        else:
            # don't do arithmetic on coordinates
            variables[name] = var.isel(**kwargs_new)
    else:
        # these variables should be unchanged
        variables[name] = var

# this private constructor preserves existing coordinates
# it's also much faster, because it doesn't need to do validation
difference = self._replace_vars_and_dims(variables)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this private constructor preserves existing coordinates

it's also much faster, because it doesn't need to do validation

difference = self._replace_vars_and_dims(variables)

_replace_vars_and_dims gave me an error because the size of the
dimension on which I do the diff is changing. Was I using it wrongly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me test your PR and debug this. My guess is that some of the variables were ending up with inconsistent dimension sizes (the Dataset constructor resolves that by doing an outer join of the index labels).

@shoyer
Copy link
Member

shoyer commented Jul 27, 2015

I don't see how 'mean' could work, as the coord might be of dtype str.
But I'm implementing 'upper' and 'lower'. If the user wants something
different, she can always just swap the coord manually.

Indeed, this would fail for string dtypes. I can see some possible utility in centered coordinates, especially for second order differences. We can certainly leave this for later, though.

@shoyer
Copy link
Member

shoyer commented Jul 27, 2015

Just made a PR against your branch with my suggestion: https://github.com/andreas-h/xray/pull/1

It looks like it's working now.

@shoyer
Copy link
Member

shoyer commented Jul 28, 2015

This could also use a bit of documentation -- at least mention on "What's New" and in the API docs.

if n == 0:
return self
if n < 0:
raise ValueError('order `n` must be non-negative but got {}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a test that catches each of these exceptions

@shoyer
Copy link
Member

shoyer commented Aug 19, 2015

@andreas-h we're going to release v0.6 in the next day or two. I'd love to include this enhancement in it if you have the time to finish it up :).

@andreas-h
Copy link
Contributor Author

That's strange, I cannot reproduce the test failure on my system ... Any ideas?

@shoyer
Copy link
Member

shoyer commented Aug 20, 2015

Could you try doing rebase -i master to pick out your commits? Otherwise github shows everything since the merge (look at the commit tab on this PR).

# prepare new coordinate
if label == 'upper':
kwargs_new = kwargs_end
elif label == 'lower':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we still need a unit test for label='lower'

shoyer added a commit that referenced this pull request Aug 21, 2015
ENH: finite difference method `diff`
@shoyer shoyer merged commit f5575b2 into pydata:master Aug 21, 2015
@shoyer
Copy link
Member

shoyer commented Aug 21, 2015

OK, merging.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add diff method to DataArray class
2 participants