Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search result depends on datetime format #644

Closed
johntruckenbrodt opened this issue Feb 21, 2024 · 9 comments · Fixed by #686
Closed

search result depends on datetime format #644

johntruckenbrodt opened this issue Feb 21, 2024 · 9 comments · Fixed by #686

Comments

@johntruckenbrodt
Copy link

Hi. I just observed some unexpected behavior when comparing the search result from queries with datetime as string and as datetime.datetime object. Can you help me out here?

The first query returns 20 scenes in the defined time interval. The second query only returns 7 scenes outside the interval.

from datetime import datetime
from pystac_client import Client


def get_href(item):
    assets = item.assets
    ref = assets[list(assets.keys())[0]]
    return ref.href


################################################################
url = 'https://planetarycomputer.microsoft.com/api/stac/v1'
collections = ['sentinel-1-grd']

acq_start = '2022-02-02T12:00:00'
acq_end = '2022-02-02T12:30:00'

pattern = '%Y-%m-%dT%H:%M:%S'
acq_start_dt = datetime.strptime(acq_start, pattern)
acq_end_dt = datetime.strptime(acq_end, pattern)

catalog = Client.open(url)
################################################################
# query 1: datetime as str
result = catalog.search(collections=collections, max_items=None,
                        datetime=[acq_start, acq_end])
result = list(result.items())
print(f'got {len(result)} scenes')
scenes = [get_href(x) for x in result]
print('\n'.join(scenes))
################################################################
# query 2: datetime as datetime.datetime
result = catalog.search(collections=collections, max_items=None,
                        datetime=[acq_start_dt, acq_end_dt])
result = list(result.items())
print(f'got {len(result)} scenes')
scenes = [get_href(x) for x in result]
print('\n'.join(scenes))
@TomAugspurger
Copy link
Collaborator

I'm not sure if it's the cause of the observed difference, but pystac-client will "pad" the end datetime when you pass a string. See the datetime parameter at https://pystac-client.readthedocs.io/en/stable/api.html#item-search.

I wonder what you get if you set pattern = "Y-%m-%dT%H:%M:%S.%f" or "Y-%m-%dT%H:%M:%S.999"

@johntruckenbrodt
Copy link
Author

Thanks a lot @TomAugspurger for helping to solve this. I changed the strings and pattern but get the same result.
In the second case, the time is shifted to one hour earlier as can be seen when calling ItemSearch.url_with_parameters:

...datetime=2022-02-02T12%3A00%3A00Z%2F2022-02-02T12%3A30%3A00Z...
...datetime=2022-02-02T11%3A00%3A00Z%2F2022-02-02T11%3A30%3A00Z...

@johntruckenbrodt
Copy link
Author

johntruckenbrodt commented Feb 21, 2024

Okay so the problem is in ItemSearch._to_utc_isoformat. The following line converts all datetime.datetime objects to UTC:

dt = dt.astimezone(timezone.utc)

I guess in my case the one hour difference comes from the difference between UTC and CET where I am.

>>> from datetime import datetime, timezone
>>> acq_start = '2022-02-02T12:00:00'
>>> pattern = '%Y-%m-%dT%H:%M:%S'
>>> acq_start_dt = datetime.strptime(acq_start, pattern)
>>> acq_start_dt_utc = acq_start_dt.astimezone(timezone.utc)
>>> print(acq_start_dt_utc)
2022-02-02 11:00:00+00:00

@TomAugspurger
Copy link
Collaborator

TomAugspurger commented Feb 21, 2024 via email

@gadomski
Copy link
Member

I'd be open to deprecating the current behavior unless
there's strong precedence elsewhere in the stac-utils ecosystem around how
tz-naive datetimes are treated.

There's not a "usual" behavior as far as I know, so changing pystac-client to do the least surprising thing gets a 👍🏼 from me.

@gadomski
Copy link
Member

gadomski commented May 9, 2024

🤔 I dug into this a bit, and I think pystac-client is doing "the right thing", i.e. interpreting strings w/o timezone information as UTC:

return f"{component}Z", None
else:
year = int(match.group("year"))
optional_month = match.group("month")
optional_day = match.group("day")
if optional_day is not None:
start = datetime_(
year,
int(optional_month),
int(optional_day),
0,
0,
0,
tzinfo=tzutc(),
)
end = start + relativedelta(days=1, seconds=-1)
elif optional_month is not None:
start = datetime_(year, int(optional_month), 1, 0, 0, 0, tzinfo=tzutc())
end = start + relativedelta(months=1, seconds=-1)
else:
start = datetime_(year, 1, 1, 0, 0, 0, tzinfo=tzutc())
end = start + relativedelta(years=1, seconds=-1)
return self._to_utc_isoformat(start), self._to_utc_isoformat(end)

@johntruckenbrodt I think two different results are correct for your original post, since acq_start_dt = datetime.strptime(acq_start, pattern) is creating a datetime in your local TZ, which will be different than how pystac-client interprets the string.

@johntruckenbrodt
Copy link
Author

Ah I see. Wouldn't it then be more intuitive to assume the same time zone for strings and datetime objects if not specified by the user?
From how I interpret this, the time zone interpretation for unaware objects is up to the application.

Currently, the documentation of ItemSearch reads:

Instances of datetime.datetime may be either timezone aware or unaware. Timezone aware instances will be converted to a UTC timestamp before being passed to the endpoint. Timezone unaware instances are assumed to represent UTC timestamps.

However, datetime.strptime(acq_start, pattern) is creating an unaware object and the object is still converted to UTC.

@gadomski
Copy link
Member

@johntruckenbrodt right you are — thanks for sticking with it. I agree, #686 should fix -- LMK if that matches your understanding.

@johntruckenbrodt
Copy link
Author

Great! IMO this fixes it. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants