Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling discontinuous temporal datasets #81

Closed
raybellwaves opened this issue Oct 18, 2023 · 1 comment
Closed

handling discontinuous temporal datasets #81

raybellwaves opened this issue Oct 18, 2023 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@raybellwaves
Copy link
Contributor

This may be more of a constraint of gee than xee. I find it interesting that:

ic = ee.ImageCollection("LANDSAT/LC08/C02/T1_TOA")
ds = xarray.open_dataset(ic, engine="ee")

provides data for 2014-06-09T14:08:03.992 to 2015-03-03T14:26:15.166:

>>> ds["time"]
<xarray.DataArray 'time' (time: 1715353)>
array(['2014-06-09T14:08:03.992000000', '2014-06-25T14:08:05.834000000',
       '2014-07-11T14:08:14.718000000', ..., '2015-03-03T14:25:27.235000000',
       '2015-03-03T14:25:51.203000000', '2015-03-03T14:26:15.166000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2014-06-09T14:08:03.992000 ... 2015-03-03T...

Other data exists:

ic = ee.ImageCollection("LANDSAT/LC08/C02/T1_TOA").filterDate("2023-01-01", "2023-01-31")
ds = xarray.open_dataset(ic, engine="ee")

and i'm guessing the prior data comes from constant availible data in time (~2014 - ~2015)?

This may go hand-in-hand with #47 with an internal (or intake) data catalog tool.

@jdbcode
Copy link
Member

jdbcode commented Dec 20, 2023

Sort by the time dimension. In this case the EE ImageCollection is not sorted by system:time_start so you need to do it either on the EE ImageCollection or on the Xarray Dataset. The former, on a collection so large, appears to not complete successfully, likely due to exceeding memory limitations. Sorting the Xarray Dataset appears to work as expected.

Not sorted by time

First and last:

ds['time'][0]
> 2014-06-09T14:08:03.992000000

ds['time'][-1]
> 2015-03-03T14:26:15.166000000

Min and max:

ds['time'].min()
> 2013-03-18T15:58:14.742000000

ds['time'].max()
> 2023-12-13T12:45:59.470000000

Sorted by time

ds_time_sorted = ds.sortby('time')
ds_time_sorted['time']
<xarray.DataArray 'time' (time: 1740748)>
array(['2013-03-18T15:58:14.742000000', '2013-03-18T15:59:26.127000000',
       '2013-03-18T15:59:49.921000000', ..., '2023-12-13T12:44:47.645000000',
       '2023-12-13T12:45:35.528000000', '2023-12-13T12:45:59.470000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2013-03-18T15:58:14.742000 ... 2023-12-13T...

Shows there is data for 2013-03-18T15:58:14.742 to 2023-12-13T12:45:59.47 as expected in proper order by time.

Comments

I suppose the returned Dataset could be pre-sorted by primary_dim_name (default 'time'), but if a person changes primary_dim_property to something that is not time, it's unknown whether ascending sort would be desired. It might be better to let the requester handle sorting themselves.

Closing the issue since the original concern has been explained/resolved and seems to be WAI. Please open a new feature request issue if you think that some sort of pre-sorting should be enabled. Tagging as documentation as a reminder to include a note about EE ImageCollections possibly not being in ascending temporal order and how to sort the resulting Dataset.

@jdbcode jdbcode closed this as completed Dec 20, 2023
@jdbcode jdbcode added the documentation Improvements or additions to documentation label Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants