-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack Contains NaT in Time Field #235
Comments
You can drop them before trying to crop to the AOI like this: |
@clausmichele is right; a workaround for now is to drop the missing item. Or you could manually fix the input datetime properties to all be in the same format. However, I consider this a stackstac bug. The problem indeed is with that one datetime near the end that doesn't have fractional seconds:
The STAC spec just says that datetimes should be formatted according to RFC 3339, section 5.6. We can see there that I found that the issue seems to be related to the stackstac/stackstac/prepare.py Lines 408 to 412 in 694c686
Weirdly, with pandas 1.3.5, using >>> ts = sorted(item.properties["datetime"] for item in items)
>>> pd.to_datetime(ts, infer_datetime_format=True, errors='raise')
DatetimeIndex(['2022-08-04 10:32:05.634000+00:00',
...
'2022-09-03 10:32:02.517000+00:00',
'2022-09-03 10:32:05+00:00',
'2022-09-05 10:22:19.490000+00:00',
'2022-09-05 10:22:20.519000+00:00'],
dtype='datetime64[ns, UTC]', freq=None)
>>> pd.to_datetime(ts, infer_datetime_format=True, errors='coerce')
DatetimeIndex(['2022-08-04 10:32:05.634000', '2022-08-04 10:32:08.099000',
'2022-08-06 10:22:16.627000', '2022-08-06 10:22:18.455000',
'2022-08-09 10:32:12.888000', '2022-08-09 10:32:15.351000',
'2022-08-11 10:22:08.959000', '2022-08-11 10:22:10.785000',
'2022-08-14 10:32:04.138000', '2022-08-14 10:32:06.617000',
'2022-08-16 10:22:17.889000', '2022-08-16 10:22:19.720000',
'2022-08-19 10:32:13.767000', '2022-08-19 10:32:16.217000',
'2022-08-21 10:22:06.588000', '2022-08-21 10:22:08.412000',
'2022-08-24 10:32:01.411000', '2022-08-24 10:32:03.910000',
'2022-08-26 10:22:20.430000', '2022-08-26 10:22:21.453000',
'2022-08-29 10:32:13.577000', '2022-08-29 10:32:16.024000',
'2022-08-31 10:22:06.328000', '2022-08-31 10:22:08.153000',
'2022-09-03 10:32:02.517000', 'NaT',
'2022-09-05 10:22:19.490000', '2022-09-05 10:22:20.519000'],
dtype='datetime64[ns]', freq=None) However, after switching to pandas 2, we do get a helpful and informative error: >>> pd.__version__
2.0.3
>>> In [7]: pd.to_datetime(ts, errors='raise')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-0ac50347e0b9> in <cell line: 1>()
----> 1 pd.to_datetime(ts, errors='raise')
...
ValueError: time data "2022-09-03T10:32:05Z" doesn't match format "%Y-%m-%dT%H:%M:%S.%f%z", at position 25. You might want to try:
- passing `format` if your strings have a consistent format;
- passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this. This makes sense: there are multiple datetime formats present, so Given that the STAC spec indicates all datetimes should be a subset of ISO 8601, >>> pd.to_datetime(ts, errors='raise', format='ISO8601')
DatetimeIndex(['2022-08-04 10:32:05.634000+00:00',
...
'2022-08-31 10:22:08.153000+00:00',
'2022-09-03 10:32:02.517000+00:00',
'2022-09-03 10:32:05+00:00',
'2022-09-05 10:22:19.490000+00:00',
'2022-09-05 10:22:20.519000+00:00'],
dtype='datetime64[ns, UTC]', freq=None)
I think I'll probably fix this by using |
Thanks guys, that worked great. |
Glad that works for you, but dropping data doesn't seem like an ideal workaround to me, so I'll reopen this until the underlying problem is fixed. |
To provide a bit more context to this bug, I have been using stackstac to document HLS (Landsat 8/9 + Sentinel-2) clear observation availability over Canada and came across these NaT time-steps. My "fix" has been to drop the NaT time-steps in a similar manner as demonstrated here, and I have documenting how prevalent they are across the HLS archive. Originally, I thought this was a metadata issue within the HLS archive, and only recently realized this was a stackstac bug, otherwise I would have fixed the missing dates, rather than removing them. For HLS, this impacts Sentinel-2 more than Landsat. Across the couple years of imagery I have documented so far (2018 - 2020), it impacts <0.1% of Landsat images and 0.2 - 1% of Sentinel-2 images depending on the year. It seems to be more prevalent in 2018 for S30 (1%), dropping to 0.2% by 2020. Most of the time, the NaT time-steps are a couple images here and there (e.g., <5 in a year looking at all available imagery). However, I have noticed a few cases where almost every time-step in a year is NaT. Here is a basic reproducible example of this case:
In this case, the first datetime in the stack is When doing NRT work or phenology, for-example, every image counts! So good to document and hopefully fix these edge cases. |
Agreed that it should be stackstac's responsibility to parse correctly, in In the meantime, using This does ensure all dates are consistently formatted and I get my expected xr.DataArray |
Finally fixed in 0.5.1, now available on pypi! |
I am having a problem with missing time values in my stack. Reproduce as follows:
The query contains 28 items.
All of the items have times. All times are 27 char strings, apart from "2022-09-03T10:32:05Z" - is this the source of the bug?
However, when I stack them:
stack = stackstac.stack(items, epsg=4326)
...one of the items has its time replaced with a NaT.
This is causing an error when I try to crop the stack to my AOI. What is a suitable workaround for this? Many thanks.
The text was updated successfully, but these errors were encountered: