Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix datetime decoding when time units are 'days since 0000-01-01 00:00:00' #523

Closed
wants to merge 2 commits into from

Conversation

rabernat
Copy link
Contributor

@rabernat rabernat commented Aug 9, 2015

This fixes #521 using the workaround described in Unidata/netcdf4-python#442.

@jhamman
Copy link
Member

jhamman commented Aug 9, 2015

I have a few general comments,

  1. Can you point me to where in the CF Conventions or UDUNITS the valid time coordinate units defined?
  2. We should think about whether or not this fix belongs in xray or netCDF4. I am of the opinion that if the CF Conventions do in fact support the units in question, we should apply this fix in the netCDF package. If they don't, I don't think we wan to support it here either.
  3. @shoyer is out for a bit but he will almost certainly want to weigh in on this.

If we end up going this route, you'll want to add some unit tests in test_conventions.py.

since_yr_idx = units.index('since ') + 6
year = int(units[since_yr_idx:since_yr_idx+4])
assert year==0, 'this fix only works for days since year 0'
year_diff = year - 0001
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3 is raising a SyntaxError here. This should just be year_diff = year - 1.

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 9, 2015

Can you point me to where in the CF Conventions or UDUNITS the valid time coordinate units defined?

This edge case is defined here: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#climatological-statistics

NB: People that follow the COARDS conventions will also need this. Any Udunits wrapper can deal with that for you:

import cf_units
u = cf_units.Unit('days since 0000-01-01 00:00:00', calendar=cf_units.CALENDAR_NO_LEAP)
ut = u.utime()
# Returns a fake datetime object (See http://scitools.org.uk/iris/docs/latest/iris/iris/unit.html#iris.unit.Unit.num2date)
ut.num2date(0)
0-01-01 00:00:00
# Note that python datetime cannot take year = 0, but udunits did the "right" thing.
ut.date2num(datetime(1, 1, 1))
365.0

We should think about whether or not this fix belongs in xray or netCDF4. I am of the opinion that if the CF Conventions do in fact support the units in question, we should apply this fix in the netCDF package. If they don't, I don't think we wan to support it here either.

I strongly disagree with that. netCDF is not one CDM and should not follow the CF-conventions! I know that iris does follow CF very closely (annoyingly in fact) and xray "kind of" follow (which is great BTW). However, if such conventions were adopted in the netCDF package, how will we load non-CF files?

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 9, 2015

Note that I am not questioning the validity of those dates neither its use. But they are out there and people need to deal with that monstrosity feature. If xray decides to support this it has to be implemented very carefully and with some warnings...

@rabernat
Copy link
Contributor Author

rabernat commented Aug 9, 2015

@ocefpaf NCAR is one of the lead institutions in terms of the CF conventions. Yet the CESM POP model, also developed at NCAR, has this "year 0" issue! To me this suggests that any practical, real-world application will need to deal with special cases like this one.

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 9, 2015

@ocefpaf NCAR is one of the lead institutions in terms of the CF conventions. Yet the CESM POP model, also developed at NCAR, has this "year 0" issue! To me this suggests that any practical, real-world application will need to deal with special cases like this one.

I know 😉 that is why I use cf_units!

PS: do you have an OPenDAP endpoint for you data? I would like to make a few tests here.

@rabernat
Copy link
Contributor Author

@ocefpaf Unfortunately no OPenDAP endpoint, but I put a sample file on our ftp server
ftp://ftp.ldeo.columbia.edu/pub/rpa/pop_sample/b40.1850.track1.2deg.wcm.007.pop.h.0100-01.nc

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 10, 2015

@rabernat here is an example on how to read those dates using cf_units (big download for a small example 😉):

http://nbviewer.ipython.org/gist/ocefpaf/d14bd8ad24f4e1a47b19

My point with this example is: netCDF4-python should not do that for you! And, of xray wants to go down that road, it must be done with some warnings and good documentation!

By accepting these rules (by rules I mean how UDUNITS interprets the dates and how that interpretation is part of the CF-convention) you are leaving the world of "read my data" and entering a world of "interpret my data."

cf_units interprets your data as CF-compliant and does what you expect from that Common Data Model.

d.hour, d.minute, d.second)
for d in time])
warnings.warn('Detected time units with reference date of year 0. '
'This is not valid according to CF convetions. '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid! It is not recommended.

@rabernat
Copy link
Contributor Author

Here is another example of a dataset with a similar time encoding issue. This is a valid OpenDAP URL:
http://data.nodc.noaa.gov/thredds/dodsC/woa/WOA13/DATAv2/temperature/netcdf/decav/1.00/woa13_decav_t16_01v2.nc

The time units are 'months since 0000-01-01 00:00:00'. This is the very popular World Ocean Atlas. Since it is a monthly climatology, the dates are referenced to year 0.

@shoyer
Copy link
Member

shoyer commented Aug 13, 2015

I would also strongly prefer to get this fix in netCDF4 upstream rather than in xray, if possible.

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 13, 2015

I would also strongly prefer to get this fix in netCDF4 upstream rather than in xray, if possible.

I disagree. Maybe not even xray should "fix" this. There are two issues here:

  1. Interpreting year 0 as 1;
  2. Non-standard calendars.

Maybe (2) should be in np.datetime64 (and pandas), but even that is hard to become a reality due to its niche use.

(1) is an UDUNITS aberration interpretation that made into some Conventions (CF and COARDS) and does not belong in the netCDF4 package. Does xray wants to became a CF CDM? I sure hope not! I have iris for that but for everything else I use xray.

And if people really want to get those niche date specification from xray they can just read the raw time data and parse it using one of the several UDUNITS wrappers or convention compliance/checkers out there. Here is another one that should do the right thing here for that case: https://pypi.python.org/pypi/coards/1.0.5

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 13, 2015

Just so you see how messy this can be. The year 0000, because it does not exist, is used to store climatology in COARDS and CF. The coards package issued a warning but did the wrong thing:

from coards import parse
units = 'days since 0000-01-01 00:00:00'
parse(0, units)
coards/__init__.py:60: UserWarning: Shifted data 366 days to the future, since year zero does not exist.
  UserWarning)
datetime.datetime(1, 1, 1, 0, 0)

If someone accidentally saves that date back with this object it will no longer be year 0 and other CDMs might choke by not recognizing it as climatology. cf_units does the right thing:

import cf_units
u = cf_units.Unit('days since 0000-01-01 00:00:00', calendar=cf_units.CALENDAR_NO_LEAP)
ut = u.utime()
ut.num2date(0)
0-01-01 00:00:00

But that object is pretty much useless for pandas and xray because of the non-standard calendar. The object is useful to annotate plots or to save the data metadata back the same way it was before.

@shoyer
Copy link
Member

shoyer commented Aug 13, 2015

@ocefpaf To be clear, by "strongly prefer to get this fix upstream" I mostly meant that I am reluctant to include this in xray.

I would like it to be straightforward for others to extend our reading capabilities for netcdfs by adding custom logic like this for their own equivalent of xray.open_dataset that builds on what we have in xray.

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 13, 2015

@ocefpaf To be clear, by "strongly prefer to get this fix upstream" I mostly meant that I am reluctant to include this in xray.

Great. We don't need another interpretation of the CF-standards out there!

See the WOA13 dataset above as an example of the problems this brings. The data has no calendar and use units months since. CF discourage the use of months for obvious reasons and no calendar translates to:

gregorian or standard Mixed Gregorian/Julian calendar as defined by Udunits.

Which one! Year zero is invalid in one and (kind of) accepted in the other!!

Long story short: the dataset is valid CF but it is unclear how to interpret the dates. No one cares because people using that dataset know that t16 means Autumn (Oh no wait!! I am in the South Hemisphere, so that means Spring 😜)

PS: Note that there is a none calendar too in the CF conventions that is more adequate for datasets like that.

@rabernat
Copy link
Contributor Author

Ok, thanks for the thoughtful discussion. I understand why you both feel this shouldn't be implemented in xray. My one objection to the discussion is that I don't think that climatological time is such a "niche" issue--processing climate model output (much of which has no specific calendar date associated with it but still has seasonal cycles, etc.) seems like one of the most useful applications for xray.

I come at this from a very practical point of view. I need to use certain datasets (e.g. WOA13, POP model output) for my research and teaching. I want to teach xray in my fall physical oceanography class---it is perfect for teaching because it "just works" and allows students to quickly load data and do basic analysis easily. This PR request would allow me to do that. I understand the counter-arguments, but I don't know exactly how I should proceed if this can't be part of xray. The options seem to be:

  1. Don't use the datasets
  2. Don't use xray
  3. Use xray without time support

Is there something else I'm missing?

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 14, 2015

I want to teach xray in my fall physical oceanography class

@rabernat I hear you! I suffer from the same problem. But I think we should teach students how to defend themselves from bad practices and tools limitations. (We have both in there! Talking about non-standard calendar and accepted but non-recommended standards.)

  1. Use xray without time support

Not ideal but that is probably the way to go. My guess is that you can load the data in xray, but you cannot get those dates into the pandas indexing machinery, right? That is not too bad since there are not many "dates operations" that you can do anyway. Most of the time we only need to translate the time information into a label for the figures. If that is the case you are OK with the tools we have now.

@rabernat
Copy link
Contributor Author

Regarding the non-standard calendar support, is it worth opening issues in numpy / pandas?

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 14, 2015

Regarding the non-standard calendar support, is it worth opening issues in numpy / pandas?

Worth? Yes. Any hope to actually get it in there? No...

@jhamman
Copy link
Member

jhamman commented Aug 14, 2015

Worth? Yes. Any hope to actually get it in there? No...

I think I disagree. There is almost no chance anyone outside of the climate community is going to spend time on this but, if calendar support was added in a responsible way to numpy and pandas, I don't see why they wouldn't be interested. So it will need to come from the climate users community, which IMHO, is under represented in the dev community.

If you don't want open the issue, I will.

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 14, 2015

@jhamman sorry for making this thread longer than it should. But I don't think you disagree! You are just more optimist than I am 😉

But you are right. We need a champion from the climate community. And if either of you open the issue I will be the second man on the hill.

@shoyer
Copy link
Member

shoyer commented Aug 14, 2015

Calendar support is numpy is conceivable, but it will pretty much require fixing numpy dtypes first so that they can be parametrized and extended by third parties in Python (this is on the roadmap). Right now the datetime64 type itself is pretty buggy, in large part because it's written in C code that nobody is maintaining.

.

For pandas, I think the bigger issue is that pandas only does datetime64 with ns resolution. Simply adding us support would go a long ways toward solving this. See here for some discussion on the pandas side: pandas-dev/pandas#7307

On Fri, Aug 14, 2015 at 9:17 AM, Joe Hamman notifications@github.com
wrote:

Worth? Yes. Any hope to actually get it in there? No...
I think I disagree. There is almost no chance anyone outside of the climate community is going to spend time on this but, if calendar support was added in a responsible way to numpy and pandas, I don't see why they wouldn't be interested. So it will need to come from the climate users community, which IMHO, is under represented in the dev community.

If you don't want open the issue, I will.

Reply to this email directly or view it on GitHub:
#523 (comment)

@rabernat
Copy link
Contributor Author

I am the least knowledgeable person here regarding numpy and pandas development. The issues should probably be opened by someone else.

@jhamman
Copy link
Member

jhamman commented Aug 14, 2015

I submitted an feature request over at numpy. I'm going to close this now.

@rabernat - thanks for the PR and keep them coming.

@jhamman jhamman closed this Aug 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

time decoding error with "days since"
4 participants