-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix datetime decoding when time units are 'days since 0000-01-01 00:00:00' #523
Conversation
I have a few general comments,
If we end up going this route, you'll want to add some unit tests in test_conventions.py. |
since_yr_idx = units.index('since ') + 6 | ||
year = int(units[since_yr_idx:since_yr_idx+4]) | ||
assert year==0, 'this fix only works for days since year 0' | ||
year_diff = year - 0001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 3 is raising a SyntaxError
here. This should just be year_diff = year - 1
.
This edge case is defined here: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#climatological-statistics NB: People that follow the COARDS conventions will also need this. Any Udunits wrapper can deal with that for you: import cf_units
u = cf_units.Unit('days since 0000-01-01 00:00:00', calendar=cf_units.CALENDAR_NO_LEAP)
ut = u.utime()
# Returns a fake datetime object (See http://scitools.org.uk/iris/docs/latest/iris/iris/unit.html#iris.unit.Unit.num2date)
ut.num2date(0)
0-01-01 00:00:00
# Note that python datetime cannot take year = 0, but udunits did the "right" thing.
ut.date2num(datetime(1, 1, 1))
365.0
I strongly disagree with that. netCDF is not one CDM and should not follow the CF-conventions! I know that iris does follow CF very closely (annoyingly in fact) and xray "kind of" follow (which is great BTW). However, if such conventions were adopted in the netCDF package, how will we load non-CF files? |
Note that I am not questioning the validity of those dates neither its use. But they are out there and people need to deal with that |
@ocefpaf NCAR is one of the lead institutions in terms of the CF conventions. Yet the CESM POP model, also developed at NCAR, has this "year 0" issue! To me this suggests that any practical, real-world application will need to deal with special cases like this one. |
I know 😉 that is why I use PS: do you have an OPenDAP endpoint for you data? I would like to make a few tests here. |
@ocefpaf Unfortunately no OPenDAP endpoint, but I put a sample file on our ftp server |
@rabernat here is an example on how to read those dates using http://nbviewer.ipython.org/gist/ocefpaf/d14bd8ad24f4e1a47b19 My point with this example is: By accepting these rules (by rules I mean how UDUNITS interprets the dates and how that interpretation is part of the CF-convention) you are leaving the world of "read my data" and entering a world of "interpret my data."
|
d.hour, d.minute, d.second) | ||
for d in time]) | ||
warnings.warn('Detected time units with reference date of year 0. ' | ||
'This is not valid according to CF convetions. ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is valid! It is not recommended.
Here is another example of a dataset with a similar time encoding issue. This is a valid OpenDAP URL: The time units are 'months since 0000-01-01 00:00:00'. This is the very popular World Ocean Atlas. Since it is a monthly climatology, the dates are referenced to year 0. |
I would also strongly prefer to get this fix in netCDF4 upstream rather than in xray, if possible. |
I disagree. Maybe not even xray should "fix" this. There are two issues here:
Maybe (2) should be in (1) is an UDUNITS And if people really want to get those niche date specification from xray they can just read the raw time data and parse it using one of the several UDUNITS wrappers or convention compliance/checkers out there. Here is another one that should do the right thing here for that case: https://pypi.python.org/pypi/coards/1.0.5 |
Just so you see how messy this can be. The year from coards import parse
units = 'days since 0000-01-01 00:00:00'
parse(0, units)
coards/__init__.py:60: UserWarning: Shifted data 366 days to the future, since year zero does not exist.
UserWarning)
datetime.datetime(1, 1, 1, 0, 0) If someone accidentally saves that date back with this object it will no longer be year 0 and other CDMs might choke by not recognizing it as climatology. import cf_units
u = cf_units.Unit('days since 0000-01-01 00:00:00', calendar=cf_units.CALENDAR_NO_LEAP)
ut = u.utime()
ut.num2date(0)
0-01-01 00:00:00 But that object is pretty much useless for pandas and xray because of the non-standard calendar. The object is useful to annotate plots or to save the data metadata back the same way it was before. |
@ocefpaf To be clear, by "strongly prefer to get this fix upstream" I mostly meant that I am reluctant to include this in xray. I would like it to be straightforward for others to extend our reading capabilities for netcdfs by adding custom logic like this for their own equivalent of |
Great. We don't need another interpretation of the CF-standards out there! See the WOA13 dataset above as an example of the problems this brings. The data has no calendar and use units
Which one! Year zero is invalid in one and (kind of) accepted in the other!! Long story short: the dataset is valid CF but it is unclear how to interpret the dates. No one cares because people using that dataset know that PS: Note that there is a |
Ok, thanks for the thoughtful discussion. I understand why you both feel this shouldn't be implemented in xray. My one objection to the discussion is that I don't think that climatological time is such a "niche" issue--processing climate model output (much of which has no specific calendar date associated with it but still has seasonal cycles, etc.) seems like one of the most useful applications for xray. I come at this from a very practical point of view. I need to use certain datasets (e.g. WOA13, POP model output) for my research and teaching. I want to teach xray in my fall physical oceanography class---it is perfect for teaching because it "just works" and allows students to quickly load data and do basic analysis easily. This PR request would allow me to do that. I understand the counter-arguments, but I don't know exactly how I should proceed if this can't be part of xray. The options seem to be:
Is there something else I'm missing? |
@rabernat I hear you! I suffer from the same problem. But I think we should teach students how to defend themselves from bad practices and tools limitations. (We have both in there! Talking about non-standard calendar and accepted but non-recommended standards.)
Not ideal but that is probably the way to go. My guess is that you can load the data in xray, but you cannot get those dates into the pandas indexing machinery, right? That is not too bad since there are not many "dates operations" that you can do anyway. Most of the time we only need to translate the time information into a label for the figures. If that is the case you are OK with the tools we have now. |
Regarding the non-standard calendar support, is it worth opening issues in numpy / pandas? |
Worth? Yes. Any hope to actually get it in there? No... |
I think I disagree. There is almost no chance anyone outside of the climate community is going to spend time on this but, if calendar support was added in a responsible way to numpy and pandas, I don't see why they wouldn't be interested. So it will need to come from the climate users community, which IMHO, is under represented in the dev community. If you don't want open the issue, I will. |
@jhamman sorry for making this thread longer than it should. But I don't think you disagree! You are just more optimist than I am 😉 But you are right. We need a champion from the climate community. And if either of you open the issue I will be the second man on the hill. |
Calendar support is numpy is conceivable, but it will pretty much require fixing numpy dtypes first so that they can be parametrized and extended by third parties in Python (this is on the roadmap). Right now the datetime64 type itself is pretty buggy, in large part because it's written in C code that nobody is maintaining. . For pandas, I think the bigger issue is that pandas only does datetime64 with ns resolution. Simply adding us support would go a long ways toward solving this. See here for some discussion on the pandas side: pandas-dev/pandas#7307 On Fri, Aug 14, 2015 at 9:17 AM, Joe Hamman notifications@github.com
|
I am the least knowledgeable person here regarding numpy and pandas development. The issues should probably be opened by someone else. |
I submitted an feature request over at numpy. I'm going to close this now. @rabernat - thanks for the PR and keep them coming. |
This fixes #521 using the workaround described in Unidata/netcdf4-python#442.