Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for cftime types #2728

Merged
merged 14 commits into from Dec 6, 2018
Merged

Added support for cftime types #2728

merged 14 commits into from Dec 6, 2018

Conversation

@philippjfr
Copy link
Member

@philippjfr philippjfr commented May 25, 2018

This PR adds support for cftime types which are a custom time type used by xarray for dates that cannot be stored in a datetime64 type. A small example:

ref_date = '0181-01-01'
ds = xr.DataArray([1, 2, 3], dims=['time'],
                  coords={'time': ('time', [1, 2, 3],
                                   {'units': 'days since %s' % ref_date})}
                  ).to_dataset(name='foo')
with xr.set_options(enable_cftimeindex=True):
    ds = xr.decode_cf(ds)
print(ds)
hv_ds = hv.Dataset(ds)
hv_ds.to(hv.Curve)
<xarray.Dataset>
Dimensions:  (time: 3)
Coordinates:
  * time     (time) object 0181-01-02 00:00:00 0181-01-03 00:00:00 ...
Data variables:
    foo      (time) int64 ...

bokeh_plot

This approach seems to work well but there are a questions I have, which @rabernat and @spencerclark hopefully can answer:

  1. Which types does it make sense to support, and do they all support conversion to a timetuple? As far as I could tell this is the list of types cftime provides:
cftime._cftime.Datetime360Day
cftime._cftime.DatetimeGregorian
cftime._cftime.DatetimeJulian
cftime._cftime.DatetimeNoLeap
cftime._cftime.DatetimeProlepticGregorian
  1. Secondly in what context can I expect these types to be used? Will they mostly be used by xarray or should I also handle them elsewhere?
@rabernat
Copy link

@rabernat rabernat commented May 25, 2018

Thanks so much for taking this on! It will make climate modelers very happy.

I think I need to defer to @spencerclark or @shoyer to answer your questions, as they understand the details better.

@spencerkclark
Copy link

@spencerkclark spencerkclark commented Jul 11, 2018

@philippjfr @rabernat sorry I wasn't pinged here (note in the future my GitHub user name is "spencerkclark" instead of "spencerclark"), so I'm just seeing these questions now. Thanks for starting to work on this!

Which types does it make sense to support,

In principle any of the types could appear in the wild, so ideally all of them, but if we wanted to prioritize a couple and save the rest for later, the most common would probably be DatetimeGregorian and DatetimeNoLeap.

do they all support conversion to a timetuple?

Yes, all cftime types inherit from cftime.datetime which implements timetuple.

As far as I could tell this is the list of types cftime provides:
cftime._cftime.Datetime360Day
cftime._cftime.DatetimeGregorian
cftime._cftime.DatetimeJulian
cftime._cftime.DatetimeNoLeap
cftime._cftime.DatetimeProlepticGregorian

There is one more date type that cftime supports (cftime.DatetimeAllLeap), but it is rarely used.

Secondly in what context can I expect these types to be used? Will they mostly be used by xarray or should I also handle them elsewhere?

They'll probably appear the most in xarray and Iris data structures (since both support them natively), but they could show up in pure NumPy arrays (with dtype "object") or in Pandas DataFrames.

Handle matplotlib

With regard to matplotlib support, it might be worth looking at nc-time-axis. I think Iris uses it internally (e.g. here) for their built-in plotting functionality.

@philippjfr
Copy link
Member Author

@philippjfr philippjfr commented Dec 3, 2018

I've tried playing with nc-time-axis a bit but I can't actually get it to plot cftime objects with non-standard reference units, e.g. I haven't found any way to plot dates like this:

dates = xr.cftime_range(start='0001', periods=24, freq='MS', calendar='noleap')

My vote is to merge this without matplotlib support for now.

elif pd and isinstance(date, pd.Timestamp):
date = date.to_pydatetime()
if hasattr(date, 'timetuple'):
dt_int = calendar.timegm(date.timetuple())*1000

Seeing this makes me a little uneasy with respect to use with cftime types. I think the built-in Python calendar module always assumes the Gregorian calendar is used; in cftime, different date types operate on different calendars, so the "seconds since 1970-01-01T00:00:00" value could be different for the same time tuple, depending on the calendar.

But perhaps there's good reason for doing things this way (I'm not familiar with how dates are handled in bokeh/holoviews)?

Copy link
Member Author

@philippjfr philippjfr Dec 3, 2018

Thanks, definitely need another look at this. Is there a simple way to convert the values to a Gregorian calendar?

Ah, I see; this is appropriate if that is what you are looking to do.

However, I'm not sure if this is the ideal approach in general; for instance this will cause there to be extra space (and potentially a tick label) in the place of February 29th, 2000 for a no leap calendar (even though in that calendar that date does not exist). See an example in a gist with this branch. There are other possible edge cases too (in a 360-day calendar some dates do not exist in a Gregorian calendar, like February 30th).

I think nc-time-axis handles these edge cases properly, because it does not rely on converting dates to the Gregorian calendar. See how it handles the example above in this gist (I've also included an example of how to plot with dates generated from xarray.cftime_range).

I understand if it would be a lot more work to solve this in general though.

Copy link
Member Author

@philippjfr philippjfr Dec 5, 2018

Thanks again, I'll look at nc_time_axis in a bit more detail, running your notebook is currently giving me an error though, which I had already filled here: SciTools/nc-time-axis#40.

ValueError: invalid year provided in cftime.DatetimeNoLeap(0, 11, 27, 1, 12, 0, 4, 3, 331)

What version of nc_time_axis and cftime have? I think I have the latest version of both:

print(cftime.__version__)
print(nc_time_axis.__version__)
1.0.0
1.1.0

Sure thing, thanks for the link to that issue. In my notebook I was using cftime 1.0.2.1 and nc_time_axis 1.1.0. I can reproduce your issue if I downgrade cftime to version 1.0.0.

Copy link
Member Author

@philippjfr philippjfr Dec 5, 2018

Thanks, I'll try upgrading.

Copy link
Member Author

@philippjfr philippjfr Dec 6, 2018

Okay, I think we'll likely have to push ahead with the subtly wrong behavior in bokeh, as a proper solution would require writing custom bokeh models to correctly format dates in different calendars. Would you expect a warning when plotting dates in a non-Gregorian calendar until we've got a proper fix?

@philippjfr
Copy link
Member Author

@philippjfr philippjfr commented Dec 6, 2018

Here's the current status, you'll note that bokeh does not handle the no-leap case quite correctly, but as mentioned above doing so would require defining custom bokeh models.

screen shot 2018-12-06 at 1 23 08 am

Copy link

@spencerkclark spencerkclark left a comment

Thanks for following through with this. I really appreciate you adding support for these climate-science-specific date types!

Okay, I think we'll likely have to push ahead with the subtly wrong behavior in bokeh, as a proper solution would require writing custom bokeh models to correctly format dates in different calendars.

This is totally fine. I'm not surprised it would require a custom solution in bokeh. For now users can always switch to the matplotlib backend if they want things to be exactly correct.

Would you expect a warning when plotting dates in a non-Gregorian calendar until we've got a proper fix?

If it's not too much trouble I do think a warning would be helpful in the bokeh case. For example we now warn in xarray whenever this kind of conversion is done. At a minimum I think it should be publicly documented somewhere.

'using:\n\tconda install -c conda-forge '
'nc_time_axis')
if isinstance(value, cftime_types):
value = CalendarDateTime(cftime.datetime(*value.timetuple()[0:6]), value.calendar)

Is there a reason for converting to the generic cftime.datetime here and below? I think you can just pass value and value.calendar to the CalendarDateTime constructor.

Copy link
Member Author

@philippjfr philippjfr Dec 6, 2018

Thanks, don't think there was any good reason.

Attempts highest precision conversion of different datetime
formats to milliseconds since the epoch (1970-01-01 00:00:00).
If datetime is a cftime with a non-standard calendar the
caveats describd in cftime_to_timestamp apply.

Typo in "described"

are converted to standard Gregorian calendar. This can cause
extra space to be added for dates that don't exist in the original
calendar. In order to handle these dates correctly a custom bokeh
model with support for other calendars would have to be defind.

Typo in "defined."

cftime._cftime.Datetime360Day,
cftime._cftime.DatetimeJulian,
cftime._cftime.DatetimeNoLeap,
cftime._cftime.DatetimeProlepticGregorian

I think you are missing cftime.DatetimeAllLeap here. That said, all of these are subclasses of cftime.datetime, so I think you could get away with just using cftime.datetime here, rather than enumerating all of the different subclasses (since cftime_types is only used for instance checks).

Copy link
Member Author

@philippjfr philippjfr Dec 6, 2018

That's much easier thanks.

@philippjfr
Copy link
Member Author

@philippjfr philippjfr commented Dec 6, 2018

If it's not too much trouble I do think a warning would be helpful in the bokeh case. For example we now warn in xarray whenever this kind of conversion is done.

Thanks, I've adapted the error message from xarray for our purposes, e.g.:

Converting cftime.datetime from a non-standard calendar (noleap) to a standard calendar for plotting. This may lead to subtle errors in formatting dates, for accurate tick formatting switch to the matplotlib backend.

@philippjfr
Copy link
Member Author

@philippjfr philippjfr commented Dec 6, 2018

@jlstevens This should be ready to review and merge now.

If you're wondering about all the other changes, the dt_to_int utility was all kinds of horrendously wrong with its unit conversions, which was made up for by counter-acting it with various scaling factors in the histogram and datashading operations. The operation now actually converts between time resolutions correctly.

@jlstevens
Copy link
Contributor

@jlstevens jlstevens commented Dec 6, 2018

Thanks for tackling this! Tests are green so I'll merge now.

@jlstevens jlstevens merged commit 80dd30e into master Dec 6, 2018
2 checks passed
@spencerkclark
Copy link

@spencerkclark spencerkclark commented Dec 6, 2018

Thanks again @philippjfr!

@philippjfr
Copy link
Member Author

@philippjfr philippjfr commented Dec 6, 2018

Thank you for being so helpful @spencerkclark!

@philippjfr philippjfr deleted the cftime_types branch Dec 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants