Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for cftime types #2728

Merged
merged 14 commits into from Dec 6, 2018

Conversation

Projects
None yet
4 participants
@philippjfr
Copy link
Contributor

philippjfr commented May 25, 2018

This PR adds support for cftime types which are a custom time type used by xarray for dates that cannot be stored in a datetime64 type. A small example:

ref_date = '0181-01-01'
ds = xr.DataArray([1, 2, 3], dims=['time'],
                  coords={'time': ('time', [1, 2, 3],
                                   {'units': 'days since %s' % ref_date})}
                  ).to_dataset(name='foo')
with xr.set_options(enable_cftimeindex=True):
    ds = xr.decode_cf(ds)
print(ds)
hv_ds = hv.Dataset(ds)
hv_ds.to(hv.Curve)
<xarray.Dataset>
Dimensions:  (time: 3)
Coordinates:
  * time     (time) object 0181-01-02 00:00:00 0181-01-03 00:00:00 ...
Data variables:
    foo      (time) int64 ...

bokeh_plot

This approach seems to work well but there are a questions I have, which @rabernat and @spencerclark hopefully can answer:

  1. Which types does it make sense to support, and do they all support conversion to a timetuple? As far as I could tell this is the list of types cftime provides:
cftime._cftime.Datetime360Day
cftime._cftime.DatetimeGregorian
cftime._cftime.DatetimeJulian
cftime._cftime.DatetimeNoLeap
cftime._cftime.DatetimeProlepticGregorian
  1. Secondly in what context can I expect these types to be used? Will they mostly be used by xarray or should I also handle them elsewhere?
@rabernat

This comment has been minimized.

Copy link

rabernat commented May 25, 2018

Thanks so much for taking this on! It will make climate modelers very happy.

I think I need to defer to @spencerclark or @shoyer to answer your questions, as they understand the details better.

@spencerkclark

This comment has been minimized.

Copy link

spencerkclark commented Jul 11, 2018

@philippjfr @rabernat sorry I wasn't pinged here (note in the future my GitHub user name is "spencerkclark" instead of "spencerclark"), so I'm just seeing these questions now. Thanks for starting to work on this!

Which types does it make sense to support,

In principle any of the types could appear in the wild, so ideally all of them, but if we wanted to prioritize a couple and save the rest for later, the most common would probably be DatetimeGregorian and DatetimeNoLeap.

do they all support conversion to a timetuple?

Yes, all cftime types inherit from cftime.datetime which implements timetuple.

As far as I could tell this is the list of types cftime provides:
cftime._cftime.Datetime360Day
cftime._cftime.DatetimeGregorian
cftime._cftime.DatetimeJulian
cftime._cftime.DatetimeNoLeap
cftime._cftime.DatetimeProlepticGregorian

There is one more date type that cftime supports (cftime.DatetimeAllLeap), but it is rarely used.

Secondly in what context can I expect these types to be used? Will they mostly be used by xarray or should I also handle them elsewhere?

They'll probably appear the most in xarray and Iris data structures (since both support them natively), but they could show up in pure NumPy arrays (with dtype "object") or in Pandas DataFrames.

Handle matplotlib

With regard to matplotlib support, it might be worth looking at nc-time-axis. I think Iris uses it internally (e.g. here) for their built-in plotting functionality.

@philippjfr philippjfr force-pushed the cftime_types branch from 72d71a0 to 18d94de Dec 3, 2018

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Dec 3, 2018

I've tried playing with nc-time-axis a bit but I can't actually get it to plot cftime objects with non-standard reference units, e.g. I haven't found any way to plot dates like this:

dates = xr.cftime_range(start='0001', periods=24, freq='MS', calendar='noleap')

My vote is to merge this without matplotlib support for now.

elif pd and isinstance(date, pd.Timestamp):
date = date.to_pydatetime()
if hasattr(date, 'timetuple'):
dt_int = calendar.timegm(date.timetuple())*1000

This comment has been minimized.

@spencerkclark

spencerkclark Dec 3, 2018

Seeing this makes me a little uneasy with respect to use with cftime types. I think the built-in Python calendar module always assumes the Gregorian calendar is used; in cftime, different date types operate on different calendars, so the "seconds since 1970-01-01T00:00:00" value could be different for the same time tuple, depending on the calendar.

But perhaps there's good reason for doing things this way (I'm not familiar with how dates are handled in bokeh/holoviews)?

This comment has been minimized.

@philippjfr

philippjfr Dec 3, 2018

Author Contributor

Thanks, definitely need another look at this. Is there a simple way to convert the values to a Gregorian calendar?

This comment has been minimized.

@spencerkclark

spencerkclark Dec 4, 2018

Ah, I see; this is appropriate if that is what you are looking to do.

However, I'm not sure if this is the ideal approach in general; for instance this will cause there to be extra space (and potentially a tick label) in the place of February 29th, 2000 for a no leap calendar (even though in that calendar that date does not exist). See an example in a gist with this branch. There are other possible edge cases too (in a 360-day calendar some dates do not exist in a Gregorian calendar, like February 30th).

I think nc-time-axis handles these edge cases properly, because it does not rely on converting dates to the Gregorian calendar. See how it handles the example above in this gist (I've also included an example of how to plot with dates generated from xarray.cftime_range).

I understand if it would be a lot more work to solve this in general though.

This comment has been minimized.

@philippjfr

philippjfr Dec 5, 2018

Author Contributor

Thanks again, I'll look at nc_time_axis in a bit more detail, running your notebook is currently giving me an error though, which I had already filled here: SciTools/nc-time-axis#40.

ValueError: invalid year provided in cftime.DatetimeNoLeap(0, 11, 27, 1, 12, 0, 4, 3, 331)

What version of nc_time_axis and cftime have? I think I have the latest version of both:

print(cftime.__version__)
print(nc_time_axis.__version__)
1.0.0
1.1.0

This comment has been minimized.

@spencerkclark

spencerkclark Dec 5, 2018

Sure thing, thanks for the link to that issue. In my notebook I was using cftime 1.0.2.1 and nc_time_axis 1.1.0. I can reproduce your issue if I downgrade cftime to version 1.0.0.

This comment has been minimized.

@philippjfr

philippjfr Dec 5, 2018

Author Contributor

Thanks, I'll try upgrading.

This comment has been minimized.

@philippjfr

philippjfr Dec 6, 2018

Author Contributor

Okay, I think we'll likely have to push ahead with the subtly wrong behavior in bokeh, as a proper solution would require writing custom bokeh models to correctly format dates in different calendars. Would you expect a warning when plotting dates in a non-Gregorian calendar until we've got a proper fix?

@philippjfr philippjfr force-pushed the cftime_types branch from 18d94de to 88bebf3 Dec 5, 2018

Philipp Rudiger added some commits Dec 5, 2018

Philipp Rudiger Philipp Rudiger
Philipp Rudiger Philipp Rudiger
@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Dec 6, 2018

Here's the current status, you'll note that bokeh does not handle the no-leap case quite correctly, but as mentioned above doing so would require defining custom bokeh models.

screen shot 2018-12-06 at 1 23 08 am

Philipp Rudiger added some commits Dec 6, 2018

@spencerkclark
Copy link

spencerkclark left a comment

Thanks for following through with this. I really appreciate you adding support for these climate-science-specific date types!

Okay, I think we'll likely have to push ahead with the subtly wrong behavior in bokeh, as a proper solution would require writing custom bokeh models to correctly format dates in different calendars.

This is totally fine. I'm not surprised it would require a custom solution in bokeh. For now users can always switch to the matplotlib backend if they want things to be exactly correct.

Would you expect a warning when plotting dates in a non-Gregorian calendar until we've got a proper fix?

If it's not too much trouble I do think a warning would be helpful in the bokeh case. For example we now warn in xarray whenever this kind of conversion is done. At a minimum I think it should be publicly documented somewhere.

'using:\n\tconda install -c conda-forge '
'nc_time_axis')
if isinstance(value, cftime_types):
value = CalendarDateTime(cftime.datetime(*value.timetuple()[0:6]), value.calendar)

This comment has been minimized.

@spencerkclark

spencerkclark Dec 6, 2018

Is there a reason for converting to the generic cftime.datetime here and below? I think you can just pass value and value.calendar to the CalendarDateTime constructor.

This comment has been minimized.

@philippjfr

philippjfr Dec 6, 2018

Author Contributor

Thanks, don't think there was any good reason.

Attempts highest precision conversion of different datetime
formats to milliseconds since the epoch (1970-01-01 00:00:00).
If datetime is a cftime with a non-standard calendar the
caveats describd in cftime_to_timestamp apply.

This comment has been minimized.

@spencerkclark

spencerkclark Dec 6, 2018

Typo in "described"

are converted to standard Gregorian calendar. This can cause
extra space to be added for dates that don't exist in the original
calendar. In order to handle these dates correctly a custom bokeh
model with support for other calendars would have to be defind.

This comment has been minimized.

@spencerkclark

spencerkclark Dec 6, 2018

Typo in "defined."

cftime._cftime.Datetime360Day,
cftime._cftime.DatetimeJulian,
cftime._cftime.DatetimeNoLeap,
cftime._cftime.DatetimeProlepticGregorian

This comment has been minimized.

@spencerkclark

spencerkclark Dec 6, 2018

I think you are missing cftime.DatetimeAllLeap here. That said, all of these are subclasses of cftime.datetime, so I think you could get away with just using cftime.datetime here, rather than enumerating all of the different subclasses (since cftime_types is only used for instance checks).

This comment has been minimized.

@philippjfr

philippjfr Dec 6, 2018

Author Contributor

That's much easier thanks.

Philipp Rudiger added some commits Dec 6, 2018

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Dec 6, 2018

If it's not too much trouble I do think a warning would be helpful in the bokeh case. For example we now warn in xarray whenever this kind of conversion is done.

Thanks, I've adapted the error message from xarray for our purposes, e.g.:

Converting cftime.datetime from a non-standard calendar (noleap) to a standard calendar for plotting. This may lead to subtle errors in formatting dates, for accurate tick formatting switch to the matplotlib backend.

Philipp Rudiger added some commits Dec 6, 2018

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Dec 6, 2018

@jlstevens This should be ready to review and merge now.

If you're wondering about all the other changes, the dt_to_int utility was all kinds of horrendously wrong with its unit conversions, which was made up for by counter-acting it with various scaling factors in the histogram and datashading operations. The operation now actually converts between time resolutions correctly.

Philipp Rudiger Philipp Rudiger
@jlstevens

This comment has been minimized.

Copy link
Contributor

jlstevens commented Dec 6, 2018

Thanks for tackling this! Tests are green so I'll merge now.

@jlstevens jlstevens merged commit 80dd30e into master Dec 6, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
s3-reference-data-cache Test data is cached.
Details
@spencerkclark

This comment has been minimized.

Copy link

spencerkclark commented Dec 6, 2018

Thanks again @philippjfr!

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Dec 6, 2018

Thank you for being so helpful @spencerkclark!

@philippjfr philippjfr deleted the cftime_types branch Dec 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.