Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

satpy resample call -> numby.ndarray deepcopy error #883

Closed
OppesH opened this issue Aug 22, 2019 · 11 comments · Fixed by #1126
Closed

satpy resample call -> numby.ndarray deepcopy error #883

OppesH opened this issue Aug 22, 2019 · 11 comments · Fixed by #1126

Comments

@OppesH
Copy link

OppesH commented Aug 22, 2019

Describe the bug
Resampling of OMPS EDR data does not run. The error comes from deepcopy with numby.ndarray. The data can be load as well as visualized with show command. The same happens with new OMPS SO2 PCA reader (and data files).

Is this a bug or something else? I'm new in satpy and in python in general and thus,... well, I don't know.

To Reproduce

from glob import glob
from satpy import available_readers
from satpy import Scene
from satpy import find_files_and_readers
from satpy.utils import debug_on; debug_on()

filenames = glob("/Users/hassinen/Downloads/OMPS-NPP_NMSO2-L2_2016m0106t062135_o21720_2017m0607t064428.h5")

glbl = Scene(reader="omps_edr", filenames=filenames)
glbl.load(['cldfra'])
glbl.show('cldfra')
lcl = glbl.resample('antarctica', resampler='nearest')
# Save with default names to geotiffs                                                                                                                                
lcl.save_datasets()

Expected behavior
Swath data re-sampled into a regular lat-lon grid

Actual results

[DEBUG: 2019-08-22 15:38:38 : satpy.scene] Setting 'PPP_CONFIG_DIR' to '/Applications/miniconda3/lib/python3.7/site-packages/satpy/etc'
[DEBUG: 2019-08-22 15:38:38 : satpy.readers] Reading ['/Applications/miniconda3/lib/python3.7/site-packages/satpy/etc/readers/omps_edr.yaml']
[DEBUG: 2019-08-22 15:38:39 : satpy.readers.yaml_reader] Assigning to omps_edr: ['/Users/hassinen/Downloads/OMPS-NPP_NMSO2-L2_2016m0106t062135_o21720_2017m0607t064428.h5']
[DEBUG: 2019-08-22 15:38:39 : satpy.composites] Looking for composites config file omps.yaml
[DEBUG: 2019-08-22 15:38:39 : satpy.composites] No composite config found called omps.yaml
[DEBUG: 2019-08-22 15:38:39 : satpy.readers.yaml_reader] No coordinates found for DatasetID(name='longitude_so2_gd', wavelength=None, resolution=50000, polarization=None, calibration=None, level=None, modifiers=())
[DEBUG: 2019-08-22 15:38:39 : satpy.readers.yaml_reader] No coordinates found for DatasetID(name='latitude_so2_gd', wavelength=None, resolution=50000, polarization=None, calibration=None, level=None, modifiers=())
[DEBUG: 2019-08-22 15:38:39 : satpy.resample] Could not add 'crs' coordinate with pyproj<2.0
[DEBUG: 2019-08-22 15:38:39 : satpy.writers] Enhancement configuration options: [{'name': 'stretch', 'method': <function stretch at 0x11235a950>, 'kwargs': {'stretch': 'linear'}}]
[DEBUG: 2019-08-22 15:38:39 : trollimage.xrimage] Applying stretch linear with parameters {}
[DEBUG: 2019-08-22 15:38:39 : trollimage.xrimage] Perform a linear contrast stretch.
[DEBUG: 2019-08-22 15:38:39 : trollimage.xrimage] Calculate the histogram quantiles: 
[DEBUG: 2019-08-22 15:38:39 : trollimage.xrimage] Left and right quantiles: 0.005 0.005
[DEBUG: 2019-08-22 15:38:39 : trollimage.xrimage] Interval: left=<xarray.DataArray (bands: 1)>
array([0.])
Coordinates:
    quantile  float64 0.005
Dimensions without coordinates: bands, right=<xarray.DataArray (bands: 1)>
array([1.])
Coordinates:
    quantile  float64 0.995
Dimensions without coordinates: bands
[DEBUG: 2019-08-22 15:38:39 : satpy.scene] Setting 'PPP_CONFIG_DIR' to '/Applications/miniconda3/lib/python3.7/site-packages/satpy/etc'
[DEBUG: 2019-08-22 15:38:39 : satpy.scene] Resampling DatasetID(name='cldfra', wavelength=None, resolution=50000, polarization=None, calibration=None, level=None, modifiers=())
[INFO: 2019-08-22 15:38:39 : satpy.scene] Not reducing data before resampling.
[DEBUG: 2019-08-22 15:38:39 : satpy.resample] Computing kd-tree parameters
[DEBUG: 2019-08-22 15:38:39 : satpy.resample] Resampling array-27828fbd2b0da94218330941a3bbcf21
Traceback (most recent call last):
  File "/Applications/miniconda3/lib/python3.7/copy.py", line 169, in deepcopy
    rv = reductor(4)
  File "stringsource", line 2, in h5py.h5r.Reference.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Applications/miniconda3/lib/python3.7/copy.py", line 161, in deepcopy
    y = copier(memo)
SystemError: <built-in method __deepcopy__ of numpy.ndarray object at 0x1123c97b0> returned a result with an error set

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Applications/miniconda3/lib/python3.7/copy.py", line 141, in deepcopy
    d = id(x)
SystemError: <built-in function id> returned a result with an error set

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "testi_edr.py", line 12, in <module>
    lcl = glbl.resample('euro4', resampler='nearest')
  File "/Applications/miniconda3/lib/python3.7/site-packages/satpy/scene.py", line 1098, in resample
    reduce_data=reduce_data, **resample_kwargs)
  File "/Applications/miniconda3/lib/python3.7/site-packages/satpy/scene.py", line 1053, in _resampled_scene
    res = resample_dataset(dataset, destination_area, **kwargs)
  File "/Applications/miniconda3/lib/python3.7/site-packages/satpy/resample.py", line 1019, in resample_dataset
    new_data = resample(source_area, dataset, destination_area, fill_value=fill_value, **kwargs)
  File "/Applications/miniconda3/lib/python3.7/site-packages/satpy/resample.py", line 982, in resample
    res = resampler_instance.resample(data, **kwargs)
  File "/Applications/miniconda3/lib/python3.7/site-packages/satpy/resample.py", line 395, in resample
    return self.compute(data, cache_id=cache_id, **kwargs)
  File "/Applications/miniconda3/lib/python3.7/site-packages/satpy/resample.py", line 537, in compute
    res = self.resampler.get_sample_from_neighbour_info(data, fill_value)
  File "/Applications/miniconda3/lib/python3.7/site-packages/pyresample/kd_tree.py", line 1213, in get_sample_from_neighbour_info
    attrs=deepcopy(data.attrs))
  File "/Applications/miniconda3/lib/python3.7/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Applications/miniconda3/lib/python3.7/copy.py", line 306, in _reconstruct
    value = deepcopy(value, memo)
  File "/Applications/miniconda3/lib/python3.7/copy.py", line 161, in deepcopy
    y = copier(memo)
SystemError: <built-in method __deepcopy__ of numpy.ndarray object at 0x1123c9710> returned a result with an error set

Environment Info:

  • OSX 10.14.3
  • Satpy Version: 0.16.1
  • PyResample Version: 1.12.3

Additional context
Add any other context about the problem here.

@djhoese
Copy link
Member

djhoese commented Aug 22, 2019

If you do conda list numpy and conda list xarray what do you get?

Could you also do print(glbl['cldfra'].attrs) after calling .load and let me know what you get? I've never seen this error before but it seems that there is something wrong with a numpy array in the cldfra metadata that isn't constructed well. It is pretty rare (in my experience) for a numpy array to show up in the metadata attributes.

@OppesH
Copy link
Author

OppesH commented Aug 22, 2019


# packages in environment at /Applications/miniconda3:
#
# Name                    Version                   Build  Channel
numpy                     1.17.0           py37h6b0580a_0    conda-forge

and

# packages in environment at /Applications/miniconda3:
#
# Name                    Version                   Build  Channel
xarray                    0.12.3                     py_0    conda-forge

and print(glbl['cldfra'].attrs)

OrderedDict([('start_orbit', 21720), ('end_orbit', 21720), ('file_key', 'ScienceData/CloudFraction'), ('units', '1'), ('modifiers', ()), ('level', None), ('long_name', b'Cloud Fraction'), ('name', 'cldfra'), ('shape', (400, 36)), ('resolution', 50000), ('calibration', None), ('valid_range', array([0., 1.], dtype=float32)), ('standard_name', 'cldfra'), ('_FillValue', -1.2676506e+30), ('coordinates', ('longitude_so2_gd', 'latitude_so2_gd')), ('sensor', 'OMPS'), ('coverage_content_type', b'physicalMeasurement'), ('wavelength', None), ('polarization', None), ('file_type', 'omps_tc_so2_edr_ges_disc'), ('DIMENSION_LIST', array([array([<HDF5 object reference>], dtype=object),
       array([<HDF5 object reference>], dtype=object)], dtype=object)), ('DIMENSION_LABELS', array([b'DimAlongTrack', b'DimCrossTrack'], dtype=object)), ('file_units', '1'), ('platform_name', 'NPP'), ('start_time', datetime.datetime(2016, 1, 6, 6, 21, 35)), ('end_time', datetime.datetime(2017, 6, 7, 6, 44, 28)), ('area', <pyresample.geometry.SwathDefinition object at 0x118f05240>), ('ancillary_variables', [])])

There seems to be something strange in start_ and end_times. I also have tried several different variables (file_key) and the result is the same. Maybe something in my python installation is not right? But, I got the same output with Anaconda and Miniconda installations (as expected).

@djhoese
Copy link
Member

djhoese commented Aug 22, 2019

Try, after .load, doing glbl['cldfra'].attrs.pop('DIMENSION_LIST') and glbl['cldfra'].attrs.pop('DIMENSION_LABELS'). This will remove those elements from the metadata .attrs. It looks like these are being opened/read as numpy arrays of HDF5 file objects. It's possible that the file format has changed since I last modified the reader and these new attributes are causing issues.

@OppesH
Copy link
Author

OppesH commented Aug 22, 2019

Great and thank you a lot! This fixed the issue.
I saw that the 'end_time' is as a ProductionDateTime although it should be as RangeEndingDateTime in the h5 file. Thus, it seems that something has changed.

@djhoese
Copy link
Member

djhoese commented Aug 22, 2019

Oh that is odd; an OMPS pass spanning more than a year. Would it be possible for you to put the files you are using somewhere I can get to them? Where did you get them from?

@OppesH
Copy link
Author

OppesH commented Aug 22, 2019

I got those files from NASA. There have been some file changes during passed years. And yes, we have a FTP server, but, I have to send the psw via e-mail.

In this case, the orbit is not spanning over one year - this was reprocessed file. The file specify the following time variables

ProductionDateTime
RangeBeginningDateTime
RangeEndingDateTime

The output from print(glbl['cldfra'].attrs)just gave end_time using ProductionDateTime although it should come from RangeEndingDateTime variable whereas start_time is correctly from variable RangeBeginningDateTime.

@djhoese
Copy link
Member

djhoese commented Aug 22, 2019

So I'm looking at the code and I figured out what's going on with the times at least. The start/end time are actually coming from the filename and I incorrectly marked the creation time as the end time. I can fix that.

For the weird attributes, what do you get for conda list hdf5 and conda list h5py?

@djhoese
Copy link
Member

djhoese commented Aug 22, 2019

Also you mentioned "PCA" products. I think I've found them online, but were you able to read this with current Satpy?

@djhoese
Copy link
Member

djhoese commented Aug 22, 2019

Can you point me to the exact NASA website where you got these from. I was going to fix this today, but have discovered something annoying. These new files seem to be NetCDF4 files that are being labeled as .h5 files. The NetCDF4 standard now defines these DIMENSION_LIST attributes, among others, to help them define dimensions for the data; something that is very common/standard for NetCDF4 files. Reading these files as HDF5 files, although possible, would require re-implementing everything in the NetCDF4 library for the handling of these special attributes. The files should be read as NetCDF4 files, but unfortunately the OMPS reader in Satpy was written for the older HDF5 files.

Fixing the reader to read these files is a larger task than I had thought and I will not have the time to do so. Fixing the reader would entail:

  1. Create new file types in the YAML with new file patterns for the new files. If the new filenames have exactly the same name as the old files then the following steps become a little more complicated as the python code will have to determine which type of file it is reading and open it accordingly.
  2. Create new file handler classes that are based on the NetCDF4FileHandler utility classes and that use xarray to open the files. This will create DataArray objects from the files automatically.
  3. Reimplement the necessary methods on the file handler to fill in the information necessary to work with Satpy.

This becomes almost like writing a whole new reader, but is the best option in the long run.

@OppesH
Copy link
Author

OppesH commented Aug 23, 2019

Yes, I can. Direct link for 2019 data is https://snpp-omps.gesdisc.eosdis.nasa.gov/data/SNPP_OMPS_Level2/OMPS_NPP_NMSO2_L2.2/2019/109/but you need registration. And the front page is as https://disc.gsfc.nasa.gov/datasets/OMPS_NPP_NMSO2_L2_V2/summary.
This is just for gesdisk OMPS-NPP-NMSO2 files, maybe the EDR files are fine, I haven't tested yet. And yes, the new file names seems to be the same as the old ones because I didn't edit the reader file patterns (in omps_edr.yaml) at all for the file_type: omps_tc_so2_edr_ges_disc.

Regarding PCA - those are quite new files (PCA- Principal Component Algorithm), implemented just in 2018. I'm interested to develop a reader for them because we have a Direct Readout station for OMPS and we are processing PCA SO2 products there. But, also TO3 file formats are usable for ozone etc.
Once again, thank you for your help, I appreciate it a lot.

@OppesH
Copy link
Author

OppesH commented Aug 23, 2019

To your earlier questions. conda list hdf5 give

# packages in environment at /Applications/miniconda3:
#
# Name                    Version                   Build  Channel
hdf5                      1.10.5          nompi_h15a436c_1103    conda-forge

and conda list h5pygives

# packages in environment at /Applications/miniconda3:
#
# Name                    Version                   Build  Channel
h5py                      2.9.0           nompi_py37h6248fd5_1104    conda-forge

I just today noticed that the gdal wasn't conda-forge version. But, the update didn't change anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants