Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that time is not already a coordinate in CF writer #1987

Merged
merged 7 commits into from Feb 4, 2022

Conversation

pnuu
Copy link
Member

@pnuu pnuu commented Jan 21, 2022

Fix for CF writer for cases like swath data where scan line times can be already set as a coordinate for the along-track dimension. In these cases the time dimension shouldn't be added.

Below is what happens with the current main branch, which should have been in a separate issue.

  • Tests added

I was trying to save multiple NWC SAF PPS granules to a single NetCDF file:

#!/usr/bin/env python

import glob

from satpy import Scene


def main():
    fnames = glob.glob(
        "/home/lahtinep/data/satellite/polar/ears_pps/*48471.nc")
    dsets = ['cma', 'ct', 'ctth_alti', 'ctth_pres', 'ctth_tempe', 'lon', 'lat']
    glbl = Scene(reader='nwcsaf-pps_nc', filenames=fnames)
    glbl.load(dsets)
    dtype_int16 = {'dtype': np.int16}
    dtype_float32 = {'dtype': np.float32}
    encoding = {'cma': dtype_int16,
                'ct': dtype_int16}
    glbl.save_datasets(writer='cf', base_dir='/tmp/', encoding=encoding, include_lonlats=True)


if __name__ == "__main__":
    main()

Using the current main branch, this fails with:

Can't load ancillary dataset cma_conditions
Can't load ancillary dataset cma_quality
Can't load ancillary dataset cma_status_flag
Can't load ancillary dataset cma_testlist1
Can't load ancillary dataset cma_testlist2
/home/lahtinep/Software/pytroll/pytroll_packages/satpy/satpy/writers/cf_writer.py:571: FutureWarning: The default behaviour of the CF writer will soon change to not compress data by default.
  warnings.warn("The default behaviour of the CF writer will soon change to not compress data by default.",
/home/lahtinep/Software/pytroll/pytroll_packages/satpy/satpy/writers/cf_writer.py:742: UserWarning: Dtype uint8 not compatible with CF-1.7.
  warnings.warn('Dtype {} not compatible with {}.'.format(str(ds.dtype), CF_VERSION))
Traceback (most recent call last):
  File "/home/lahtinep/bin/test_ears_nwc2netcdf.py", line 64, in <module>
    main()
  File "/home/lahtinep/bin/test_ears_nwc2netcdf.py", line 55, in main
    glbl.save_datasets(writer='cf', base_dir='/tmp/', encoding=encoding, include_lonlats=True)
  File "/home/lahtinep/Software/pytroll/pytroll_packages/satpy/satpy/scene.py", line 1143, in save_datasets
    return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
  File "/home/lahtinep/Software/pytroll/pytroll_packages/satpy/satpy/writers/cf_writer.py", line 840, in save_datasets
    datas, start_times, end_times = self._collect_datasets(
  File "/home/lahtinep/Software/pytroll/pytroll_packages/satpy/satpy/writers/cf_writer.py", line 753, in _collect_datasets
    new_var = self.da2cf(new_ds, epoch=epoch, flatten_attrs=flatten_attrs,
  File "/home/lahtinep/Software/pytroll/pytroll_packages/satpy/satpy/writers/cf_writer.py", line 658, in da2cf
    new_data = CFWriter._encode_time(new_data, epoch)
  File "/home/lahtinep/Software/pytroll/pytroll_packages/satpy/satpy/writers/cf_writer.py", line 706, in _encode_time
    new_data = new_data.expand_dims('time')
  File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/xarray/core/dataarray.py", line 1952, in expand_dims
    ds = self._to_temp_dataset().expand_dims(dim, axis)
  File "/home/lahtinep/mambaforge/envs/pytroll/lib/python3.9/site-packages/xarray/core/dataset.py", line 3658, in expand_dims
    raise ValueError(
ValueError: time already exists as coordinate or variable name.

@pnuu
Copy link
Member Author

pnuu commented Jan 25, 2022

Some details on the data in _encode_time() method:

ipdb> new_data
<xarray.DataArray 'cma' (y: 10062, x: 2048)>
dask.array<concatenate, shape=(10062, 2048), dtype=uint8, chunksize=(360, 1024), chunktype=numpy.ndarray>
Coordinates:
    time     (y) datetime64[ns] 2022-01-20T11:05:30 ... 2022-01-20T11:32:28.5...
    crs      object +proj=latlong +datum=WGS84 +ellps=WGS84 +type=crs
Dimensions without coordinates: y, x
Attributes:
    valid_range:          [0 1]
    ancillary_variables:  cma_pal
    units:                1
    _FillValue:           255
    standard_name:        cloud_binary_mask
    flag_values:          [  0   1 255]
    flag_meanings:        cloudfree cloudy no_data
    sensor:               {'seviri'}
    long_name:            SAFNWC PPS CMA Cloud Mask
    platform_name:        MetopB
    coordinates:          lon lat
    start_time:           2022-01-20 11:05:00.100000
    end_time:             2022-01-20 11:32:56.900000
    reader:               nwcsaf-pps_nc
    modifiers:            ()

So the time is already in new_data.coords, but not in .dims. I tried this

time_array = new_data.coords["time"]
del new_data.coords["time"]
new_data = new_data.expand_dims({'time': time_array})

This results in an obviously incorrect result:

ipdb> new_data
<xarray.DataArray 'cma' (time: 10062, y: 10062, x: 2048)>
dask.array<broadcast_to, shape=(10062, 10062, 2048), dtype=uint8, chunksize=(10062, 360, 1024), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 2022-01-20T11:05:30 ... 2022-01-20T11:32:2...
    crs      object +proj=latlong +datum=WGS84 +ellps=WGS84 +type=crs
Dimensions without coordinates: y, x
Attributes:
    valid_range:          [0 1]
    ancillary_variables:  cma_pal
    units:                1
    _FillValue:           255
    standard_name:        cloud_binary_mask
    flag_values:          [  0   1 255]
    flag_meanings:        cloudfree cloudy no_data
    sensor:               {'seviri'}
    long_name:            SAFNWC PPS CMA Cloud Mask
    platform_name:        MetopB
    coordinates:          lon lat
    start_time:           2022-01-20 11:05:00.100000
    end_time:             2022-01-20 11:32:56.900000
    reader:               nwcsaf-pps_nc
    modifiers:            ()

With my current modification one of the tests fail due to a missing time dimension. Any suggestions how this should be fixed?

@zxdawn
Copy link
Member

zxdawn commented Jan 25, 2022

Maybe we can just set the start_time as time? like this:
https://github.com/sfinkens/satpy/blob/7faeb9d72b6d0534ddbc633c1a8d5ff0eef7574a/satpy/multiscene.py#L59-L63

@pnuu pnuu changed the title Check that time is not already a coordinate Check that time is not already a coordinate in CF writer Jan 27, 2022
@pnuu pnuu removed the help wanted label Jan 27, 2022
@pnuu pnuu self-assigned this Jan 27, 2022
@pnuu
Copy link
Member Author

pnuu commented Jan 27, 2022

I think this is more-or-less a proper fix. What I don't understand is that I need to use f['test-array_time'] instead of f['time'] to access the times when reading the data back in the test. If the tests pass, I'll mark this as reday for review.

@codecov
Copy link

codecov bot commented Jan 27, 2022

Codecov Report

Merging #1987 (77f7fc1) into main (cccbe06) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1987      +/-   ##
==========================================
+ Coverage   93.54%   93.56%   +0.01%     
==========================================
  Files         279      281       +2     
  Lines       41322    41447     +125     
==========================================
+ Hits        38654    38779     +125     
  Misses       2668     2668              
Flag Coverage Δ
behaviourtests 4.76% <0.00%> (-0.02%) ⬇️
unittests 94.09% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/modifiers/geometry.py 87.30% <ø> (ø)
satpy/tests/writer_tests/test_cf.py 99.69% <100.00%> (+<0.01%) ⬆️
satpy/writers/cf_writer.py 94.35% <100.00%> (+0.05%) ⬆️
satpy/readers/msu_gsa_l1b.py 100.00% <0.00%> (ø)
satpy/tests/reader_tests/test_msu_gsa_l1b.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cccbe06...77f7fc1. Read the comment docs.

@pnuu pnuu marked this pull request as ready for review January 27, 2022 12:08
@coveralls
Copy link

coveralls commented Jan 27, 2022

Coverage Status

Coverage increased (+0.02%) to 94.023% when pulling 77f7fc1 on pnuu:bugfix-nc-time-coordinate into cccbe06 on pytroll:main.

Copy link
Member

@sfinkens sfinkens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for taking care of this! Just two minor comments.

The changes in multiscene and slstr_l1b come from rebasing the main branch I guess?

satpy/tests/writer_tests/test_cf.py Outdated Show resolved Hide resolved
satpy/writers/cf_writer.py Outdated Show resolved Hide resolved
@sfinkens
Copy link
Member

Maybe we can just set the start_time as time? like this: https://github.com/sfinkens/satpy/blob/7faeb9d72b6d0534ddbc633c1a8d5ff0eef7574a/satpy/multiscene.py#L59-L63

@zxdawn That was about a Multiscene timeseries, but the problem here also occurs with a normal Scene

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mraspaud mraspaud merged commit 53491bf into pytroll:main Feb 4, 2022
@pnuu pnuu deleted the bugfix-nc-time-coordinate branch February 4, 2022 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants