Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to save tropomi nc file with specific variables loaded #1493

Closed
zxdawn opened this issue Dec 25, 2020 · 20 comments · Fixed by #1588
Closed

Failed to save tropomi nc file with specific variables loaded #1493

zxdawn opened this issue Dec 25, 2020 · 20 comments · Fixed by #1588

Comments

@zxdawn
Copy link
Member

zxdawn commented Dec 25, 2020

Describe the bug
I got this error which is similar with #1031 when I tried to save the loaded scene of TROPOMI NO2 L2 data to NetCDF file.

Failed to load coordinates for 'DataID(name='latitude', modifiers=())'
Failed to load coordinates for 'DataID(name='longitude', modifiers=())'
Can't load ancillary dataset /PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_pressure
Can't load ancillary dataset /PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_pressure
[DEBUG: 2020-12-25 13:32:46 : satpy.writers] Reading ['/mnt/raid/xin/miniconda3/envs/s5p_2021/lib/python3.8/site-packages/satpy/etc/writers/cf.yaml']
[INFO: 2020-12-25 13:32:46 : satpy.writers.cf_writer] Saving datasets to NetCDF4/CF.
/yin_raid/xin/miniconda3/envs/s5p_2021/lib/python3.8/site-packages/satpy/writers/cf_writer.py:255: UserWarning: Coordinate "longitude" referenced by dataset surface_albedo does not exist, dropping reference.
  warnings.warn('Coordinate "{}" referenced by dataset {} does not exist, dropping reference.'.format(
Traceback (most recent call last):
  File "test_save.py", line 37, in <module>
    s5p.save_datasets(filename='./output/test.nc')
  File "/yin_raid/xin/miniconda3/envs/s5p_2021/lib/python3.8/site-packages/satpy/scene.py", line 1368, in save_datasets
    return writer.save_datasets(datasets, compute=compute, **save_kwargs)
  File "/yin_raid/xin/miniconda3/envs/s5p_2021/lib/python3.8/site-packages/satpy/writers/cf_writer.py", line 704, in save_datasets
    datas, start_times, end_times = self._collect_datasets(
  File "/yin_raid/xin/miniconda3/envs/s5p_2021/lib/python3.8/site-packages/satpy/writers/cf_writer.py", line 609, in _collect_datasets
    link_coords(datas)
  File "/yin_raid/xin/miniconda3/envs/s5p_2021/lib/python3.8/site-packages/satpy/writers/cf_writer.py", line 253, in link_coords
    dataset[coord] = datas[coord]
  File "/yin_raid/xin/miniconda3/envs/s5p_2021/lib/python3.8/site-packages/xarray/core/dataarray.py", line 647, in __setitem__
    self.coords[key] = value
  File "/yin_raid/xin/miniconda3/envs/s5p_2021/lib/python3.8/site-packages/xarray/core/coordinates.py", line 40, in __setitem__
    self.update({key: value})
  File "/yin_raid/xin/miniconda3/envs/s5p_2021/lib/python3.8/site-packages/xarray/core/coordinates.py", line 118, in update
    self._update_coords(coords, indexes)
  File "/yin_raid/xin/miniconda3/envs/s5p_2021/lib/python3.8/site-packages/xarray/core/coordinates.py", line 293, in _update_coords
    raise ValueError(
ValueError: cannot add coordinates with new dimensions to a DataArray

To Reproduce

import os
import glob
from satpy.scene import Scene
#from satpy.utils import debug_on; debug_on()

s5p_nc_dir = 's5p_data'
f_s5p_pattern = os.path.join(s5p_nc_dir, '*20190725T*')
f_s5p = glob.glob(f_s5p_pattern)

s5p = Scene(f_s5p, reader='tropomi_l2')
vnames = ['surface_albedo', 'latitude']
s5p.load(vnames)
s5p.save_datasets(filename='./output/test.nc')

This issue happens when some specific variables like nitrogendioxide_stratospheric_column, surface_albedo and surface_pressure etc. are included in the vnames with the longitude or latitude together.
Note that this issue doesn't exist before #1357 is merged.

Environment Info:

  • OS: Linux
  • Satpy Version: 0.23.0
@djhoese
Copy link
Member

djhoese commented Dec 25, 2020

Just curious, why are you specifically loading for longitude and latitude?

Could you print out the latitude DataArray? I'm curious what the coordinates, attributes, and dimensions are before trying to save it.

@zxdawn
Copy link
Member Author

zxdawn commented Dec 26, 2020

@djhoese Because longitude and latitude are original variables saved in the TROPOMI L2 data, we need to save them for later usage.

Here's some information about s5p['latitude'] and s5p['surface_albedo']:

<xarray.DataArray 'latitude' (y: 389, x: 450)>
dask.array<where, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
    time       datetime64[ns] 2019-07-25
    latitude   (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
    longitude  (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
Dimensions without coordinates: y, x
Attributes:
    start_time:           2019-07-25 04:27:43
    end_time:             2019-07-25 06:09:12
    long_name:            pixel center latitude
    units:                degrees_north
    standard_name:        latitude
    valid_min:            -90.0
    valid_max:            90.0
    bounds:               /PRODUCT/SUPPORT_DATA/GEOLOCATIONS/latitude_bounds
    name:                 latitude
    file_type:            tropomi_l2
    file_key:             PRODUCT/latitude
    coordinates:          ('longitude', 'latitude')
    modifiers:            ()
    platform_shortname:   S5P
    sensor:               tropomi
    _satpy_id:            DataID(name='latitude', modifiers=())
    ancillary_variables:  []



<xarray.DataArray 'surface_albedo' (y: 389, x: 450)>
dask.array<where, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
    crs      object +proj=latlong +datum=WGS84 +ellps=WGS84 +type=crs
Dimensions without coordinates: y, x
Attributes:
    start_time:            2019-07-25 04:27:43
    end_time:              2019-07-25 06:09:12
    units:                 1
    standard_name:         surface_albedo
    long_name:             Surface albedo in the cloud product
    coordinates:           ('longitude', 'latitude')
    radiation_wavelength:  758.0
    name:                  surface_albedo
    file_key:              PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_albedo
    file_type:             tropomi_l2
    modifiers:             ()
    platform_shortname:    S5P
    sensor:                tropomi
    area:                  Shape: (389, 450)\nLons: <xarray.DataArray 'longit...
    _satpy_id:             DataID(name='surface_albedo', modifiers=())
    ancillary_variables:   []

@djhoese
Copy link
Member

djhoese commented Dec 28, 2020

If you remove the coordinates from the .attrs of the lon/lats before saving, does it give you a different error?

del scn['longitude'].attrs['coordinates']
del scn['latitude'].attrs['coordinates']

@zxdawn
Copy link
Member Author

zxdawn commented Dec 28, 2020

Same error if I add del s5p['latitude'].attrs['coordinates'].

@djhoese
Copy link
Member

djhoese commented Dec 28, 2020

What if you remove the coordinates attribute from all loaded DataArrays? There may be a chance that you have to remove it from the SwathDefinition, but lets ignore that for now.

My guess, after looking at the code, is that the CF writer is looking at .attrs['coordinates'] to determine what coordinates to associate with each DataArray. This is causing 'longitude' and 'latitude' to be assigned as coordinates when the dimension names are actually 'y' and 'x' and that makes xarray unhappy. Since the error only happens when you explicitly ask for longitude and latitude to be loaded there must be some check "does this coordinate exist in the variables I know about" and it ignores the coordinate if it doesn't exist.

That said, you don't need to explicitly add the longitude and latitude variables to what you load because the CF writer will add them automatically from the SwathDefinition (I think).

@zxdawn
Copy link
Member Author

zxdawn commented Dec 28, 2020

Thanks for the explanation David.

This method works well: skipping the manual saving of lon/lat and getting the lon and lat by lon, lat = s5p['surface_albedo'].attrs['area'].get_lonlats().

It looks reasonable to me. Shall we close this issue?

@djhoese
Copy link
Member

djhoese commented Dec 28, 2020

I'd still like to know why the longitude and latitude datasets don't work. It seems like a bug in the CF writer.

@mraspaud
Copy link
Member

@ninahakansson @sfinkens might have ideas

@sfinkens
Copy link
Member

sfinkens commented Jan 4, 2021

I guess this has something to do with the additional time dimension of the lat/lon coordinates. For example (I'm trying to reproduce the behaviour of the link_coords method here) this works well:

In [20]: a = xr.DataArray([[1,2], [3,4]], dims=('y', 'x'), coords={'x': [1,2], 'y': [3,4]})
In [21]: lon = a.copy()
In [22]: a['lon'] = lon
In [23]: a
Out[23]: 
<xarray.DataArray (y: 2, x: 2)>
array([[1, 2],
       [3, 4]])
Coordinates:
  * x        (x) int64 1 2
  * y        (y) int64 3 4
    lon      (y, x) int64 1 2 3 4

But with an additional time dimension it doesn't:

In [24]: lon = lon.expand_dims('time')
In [25]: a['lon'] = lon
...
ValueError: cannot add coordinates with new dimensions to a DataArray

@djhoese
Copy link
Member

djhoese commented Jan 4, 2021

But why are the lon/lat coordinates getting a time dimension added to them? Or is that just a side effect of how xarray makes coordinates apply to all variables?

@sfinkens
Copy link
Member

sfinkens commented Jan 4, 2021

Oh, sorry. I thought the lats returned by the reader had an extra time dimension. But I misread the snippet above. It's just a scalar time coordinate, no extra dimension.

@sfinkens
Copy link
Member

sfinkens commented Jan 4, 2021

Aha, the CF writer is adding a time dimension if the dataset has a time coordinate: https://github.com/pytroll/satpy/blob/master/satpy/writers/cf_writer.py#L539 . I don't know why. But I think dropping the time coordinate from the lats/lons

lons = lons.drop('time')

should work around the problem for now.

The link_coords method could be fixed by dropping any scalar coordinates [set(lon.coords.keys()).difference(lon.dims)] before using them.

@djhoese
Copy link
Member

djhoese commented Jan 4, 2021

The link_coords method could be fixed by dropping any scalar coordinates [set(lon.coords.keys()).difference(lon.dims)] before using them.

Maybe not just scalar but any coordinates that aren't also dimensions? Right?

@zxdawn
Copy link
Member Author

zxdawn commented Jan 5, 2021

@djhoese Yes, any coordinates that aren't also dimensions could lead to the error.

BTW, I'm curious whether the time dimension is useful for MultiScene?
If we drop the time dimension, will the MultiScene still work?

@djhoese
Copy link
Member

djhoese commented Jan 5, 2021

The MultiScene does not depend on or use a time dimension at all. It only depends on the order of the Scenes you provide it. It has the possibility to create an xarray Dataset with a time dimension where it uses start_time if no time-based dimension is available (if I remember correctly).

@zxdawn
Copy link
Member Author

zxdawn commented Mar 6, 2021

If I delete the time coordinates, then the time_bnds isn't available.

However, as discussed in #1246, we don't need time_bnds at all.

Is it correct to create a PR to delete the time coordinates for TROPOMI data?

@zxdawn
Copy link
Member Author

zxdawn commented Mar 6, 2021

I had a quick test and found that we need to drop two things to make the save_datasets() work well:

  1. the time coordinates of all DataArrays

  2. coordinates attributes of longitude and latitude

If this is OK, I will create a PR.

@djhoese
Copy link
Member

djhoese commented Mar 6, 2021

Is it possible to rename things so they work better? Otherwise update CF writer to handle these? I'd prefer not to remove information from the reader unless it's redundant. Someone could be using it.

@zxdawn
Copy link
Member Author

zxdawn commented Mar 7, 2021

Here're some testing steps:

Try to fix by editing yaml and reader

Step1: Drop time coordinates

I add these:

        ## delete the longitude/latitude Coordinates for longitude/latitude variables
        if ds_id['name'] in ['longitude', 'latitude']:
            data = data.drop_vars(['time'])

above

return data

And I got this error:

  File "test_save.py", line 77, in <module>
    s5p.save_datasets(filename='./output/test.nc')
  File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/satpy/scene.py", line 1065, in save_datasets
    return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
  File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/satpy/writers/cf_writer.py", line 723, in save_datasets
    res = dataset.to_netcdf(filename, engine=engine, group=group_name, mode='a', encoding=encoding,
  File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/xarray/core/dataset.py", line 1689, in to_netcdf
    return to_netcdf(
  File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/xarray/backends/api.py", line 1107, in to_netcdf
    dump_to_store(
  File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/xarray/backends/api.py", line 1142, in dump_to_store
    variables, attrs = conventions.encode_dataset_coordinates(dataset)
  File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/xarray/conventions.py", line 806, in encode_dataset_coordinates
    return _encode_coordinates(
  File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/xarray/conventions.py", line 765, in _encode_coordinates
    written_coords.update(attrs["coordinates"].split())
AttributeError: 'list' object has no attribute 'split'

Step2: Edit coordinates in yaml file

I comment out the coordinates in yaml:

datasets:
    latitude:
        name: 'latitude'
        file_type: tropomi_l2
        file_key: 'PRODUCT/latitude'
        #coordinates: [longitude, latitude]
        standard_name: latitude
    longitude:
        name: 'longitude'
        file_type: tropomi_l2
        file_key: 'PRODUCT/longitude'
        #coordinates: [longitude, latitude]
        standard_name: longitude

Then, save_datasets() works well and the NetCDF file looks like this:

netcdf test {
dimensions:
	y = 389 ;
	x = 450 ;
variables:
	float latitude(y, x) ;
.........
	float longitude(y, x) ;
................
	float cloud_fraction_crb_nitrogendioxide_window(y, x) ;
.............
        cloud_fraction_crb_nitrogendioxide_window:coordinates = "longitude latitude" ;
.................

Try to Fix by link_coords

If I don't change the yaml and reader, just only add one print line above

dataset[coord] = datas[coord]

print(dataset[coord])

save_datasets() works again with some warnings:

/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/satpy/writers/cf_writer.py:260: UserWarning: Coordinate "longitude" referenced by dataset cloud_fraction_crb_nitrogendioxide_window does not exist, dropping reference.
  warnings.warn('Coordinate "{}" referenced by dataset {} does not exist, dropping reference.'.format(
/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/satpy/writers/cf_writer.py:260: UserWarning: Coordinate "latitude" referenced by dataset cloud_fraction_crb_nitrogendioxide_window does not exist, dropping reference.
  warnings.warn('Coordinate "{}" referenced by dataset {} does not exist, dropping reference.'.format(

The NetCDF file looks like this:

netcdf test {
dimensions:
	time = 1 ;
	y = 389 ;
	x = 450 ;
	bnds_1d = 2 ;
variables:
	int64 time(time) ;
		time:bounds = "time_bnds" ;
		time:standard_name = "time" ;
		time:units = "days since 2019-07-25" ;
		time:calendar = "proleptic_gregorian" ;
	float cloud_fraction_crb_nitrogendioxide_window(y, x) ;
.............
	float longitude(y, x, time) ;
............
	float latitude(time, y, x) ;
...........
	double time_bnds(time, bnds_1d) ;

Thoughts

I suppose the time dimension should be dropped, but don't know the correct way to add it to CF writer ...

@djhoese
Copy link
Member

djhoese commented Mar 7, 2021

In that last test with the print, are you saying it suddenly works after printing the coordinates?

The summary of my opinion on how to fix this is: If there is something semi-standard in the tropomi data (like a time dimension) that fails when using the CF writer, then the CF writer should be changed to handle it.

@zxdawn zxdawn mentioned this issue Mar 8, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants