Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve MiRS reader handling of missing metadata #1671

Merged
merged 38 commits into from May 19, 2021
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
d01b1e0
Merge branch 'iss1387' of github.com:joleenf/satpy
joleenf Oct 23, 2020
aa10285
Merge branch 'master' of https://github.com/pytroll/satpy
joleenf Oct 23, 2020
b2f71df
Merge with upstream development
joleenf Nov 24, 2020
c164f61
Merge remote-tracking branch 'refs/remotes/origin/master'
joleenf Mar 11, 2021
f8e1c42
Merge branch 'master' of https://github.com/pytroll/satpy
joleenf Mar 20, 2021
826122a
Add reader_kwarg to omit limb correction on ATMS sensors.
joleenf Mar 21, 2021
79218d7
Fix docstring in mirs reader class
joleenf Mar 22, 2021
1cdbc0c
add check for calling limb correction only when sensor is atms
joleenf Mar 22, 2021
e783a8c
Fix docstring
joleenf Mar 27, 2021
fc610c8
Add a kwarg test for limb_correction.
joleenf Mar 27, 2021
fdba5fe
Remove unused variables and add a noaa-20 test
joleenf Mar 27, 2021
a66e3bc
Fold reader_kwarg test into basic_load
joleenf Mar 28, 2021
6132c67
Simply if/then statement, split parameterize for readability.
joleenf Mar 30, 2021
f4bdbda
Simplify assertion if/then statement for limb_correction.
joleenf Mar 30, 2021
b0140d1
Remove extra line getting the name of the sensor
joleenf Mar 30, 2021
e5038a5
Getting changes to dask_ewa resampling
joleenf Apr 9, 2021
b9b7a69
Merge remote-tracking branch 'upstream/master'
joleenf Apr 14, 2021
9f85942
Use valid range when present
joleenf Apr 30, 2021
d1a0bb5
Check to confirm that valid range is no longer in attributes.
joleenf Apr 30, 2021
582837d
Add a test to check valid range was applied correctly.
joleenf Apr 30, 2021
879fce4
valid_range is inclusive so include both min/max in acceptable values
joleenf Apr 30, 2021
214ad77
Merge branch 'master' of https://github.com/pytroll/satpy
joleenf May 4, 2021
08a82ce
Add attributes when missing/apply attributes from both file and yaml …
joleenf May 6, 2021
ad76084
Fix _FillValue so that it is read as an integer
joleenf May 11, 2021
b5b3f86
Update reading of yaml
joleenf May 11, 2021
0da2c2c
Merge branch 'main' of https://github.com/pytroll/satpy into mirs_met…
joleenf May 11, 2021
bb9df5a
Test units for TEST_VARS
joleenf May 11, 2021
f4010fc
Add descriptions for some of the variables in the yaml.
joleenf May 12, 2021
6988f3b
BUG FIXES: apply attributes before limb correction and fix typos in yaml
joleenf May 13, 2021
c40ee15
Remove the change to the file_patterns, the extra file_type is not ne…
joleenf May 13, 2021
3e367c8
Add BT to yaml and use in creation of ds_info for data_id
joleenf May 14, 2021
18d3356
commit the yaml mentioned in previous commit
joleenf May 14, 2021
2ad0ba3
Take file_key out of BT dataset so that it does not get carried throu…
joleenf May 14, 2021
ebc2eff
Don't add more in yaml than necessary
joleenf May 14, 2021
fa82afa
Make sure btemp information is only initializing with yaml when neces…
joleenf May 14, 2021
6a5c817
Simplify the reading of coefficients so reading old/versus new coeffi…
joleenf May 15, 2021
1b79da9
Take out check for n_chn and n_fov since they are fixed now.
joleenf May 15, 2021
1649241
Simplify logic which repeatedly checked if file_type matched yaml dat…
joleenf May 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
103 changes: 102 additions & 1 deletion satpy/etc/readers/mirs.yaml
Expand Up @@ -25,4 +25,105 @@ file_types:
file_patterns:
- 'IMG_SX.{platform_shortname}.D{start_time:%y%j.S%H%M}.E{end_time:%H%M}.B{num}.WE.HR.ORB.nc'

datasets: {}
datasets:
longitude:
name: longitude
long_name: Longitude of the view (-180,180)
file_type: [ metop_amsu, mirs_atms ]
file_key: Longitude
units: degrees
_FillValue: -999.
valid_range: [ -180., 180. ]
standard_name: longitude
latitude:
name: latitude
long_name: Latitude of the view (-90,90)
file_type: [ metop_amsu, mirs_atms ]
file_key: Latitude
_FillValue: -999.
valid_range: [-90., 90.]
units: degrees
standard_name: latitude
rain_rate:
name: RR
description: Rain Rate
long_name: rain_rate
file_key: RR
file_type: metop_amsu
scale_factor: 0.1
_FillValue: -99.9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both scale_factor and _FillValue aren't provided in the files?

units: mm/hr
coordinates: [longitude, latitude]
mask:
name: Sfc_type
file_key: Sfc_type
file_type: metop_amsu
description: Surface Type:0-ocean,1-sea ice,2-land,3-snow
units: "1"
coordinates: [longitude, latitude]
sea_ice:
name: SIce
description: Sea Ice
long_name: sea_ice
file_key: SIce
file_type: metop_amsu
units: "%"
coordinates: [longitude, latitude]
snow_cover:
name: Snow
description: Snow Cover
long_name: snow_cover
file_key: Snow
file_type: metop_amsu
units: '1'
coordinates: [longitude, latitude]
total_precipitable_water:
name: TPW
description: Total Precipitable Water
long_name: total_precipitable_water
file_key: TPW
file_type: metop_amsu
scale_factor: 0.1
_FillValue: -99.9
units: mm
coordinates: [longitude, latitude]
swe:
name: SWE
description: Snow Water Equivalence
long_name: snow_water_equivalence
file_key: SWE
file_type: metop_amsu
scale_factor: 0.01
_FillValue: -999
units: cm
coordinates: [longitude, latitude]
cloud_liquid_water:
name: CLW
description: Cloud Liquid Water
long_name: Cloud liquid Water
file_key: CLW
file_type: metop_amsu
scale_factor: 0.01
_FillValue: -999
units: mm
coordinates: [longitude, latitude]
skin_temperature:
name: TSkin
description: skin temperature
long_name: Skin Temperature
file_key: TSkin
file_type: metop_amsu
scale_factor: 0.01
_FillValue: -999
units: K
coordinates: [longitude, latitude]
snow_fall_rate:
name: SFR
description: snow fall rate
file_key: SFR
file_type: metop_amsu
long_name: Snow Fall Rate
scale_factor: 0.01
_FillValue: -999
units: mm/hr
coordinates: [longitude, latitude]
87 changes: 57 additions & 30 deletions satpy/readers/mirs.py
Expand Up @@ -309,7 +309,7 @@ def _get_coeff_filenames(self):

return coeff_fn

def get_metadata(self, ds_info):
def update_metadata(self, ds_info):
"""Get metadata."""
metadata = {}
metadata.update(ds_info)
Expand All @@ -334,44 +334,70 @@ def _nan_for_dtype(data_arr_dtype):
return np.nan

@staticmethod
def _scale_data(data_arr, attrs):
# handle scaling
# take special care for integer/category fields
scale_factor = attrs.pop('scale_factor', 1.)
add_offset = attrs.pop('add_offset', 0.)
def _scale_data(data_arr, scale_factor, add_offset):
"""Scale data, if needed."""
scaling_needed = not (scale_factor == 1 and add_offset == 0)
if scaling_needed:
data_arr = data_arr * scale_factor + add_offset
return data_arr, attrs
return data_arr

def _fill_data(self, data_arr, attrs):
try:
global_attr_fill = self.nc.missing_value
except AttributeError:
global_attr_fill = None
fill_value = attrs.pop('_FillValue', global_attr_fill)

fill_out = self._nan_for_dtype(data_arr.dtype)
def _fill_data(self, data_arr, fill_value, scale_factor, add_offset):
"""Fill missing data with NaN."""
if fill_value is not None:
fill_value = self._scale_data(fill_value, scale_factor, add_offset)
fill_out = self._nan_for_dtype(data_arr.dtype)
data_arr = data_arr.where(data_arr != fill_value, fill_out)
return data_arr, attrs
return data_arr

def _apply_valid_range(self, data_arr, attrs):
# handle valid_range
valid_range = attrs.pop('valid_range', None)
def _apply_valid_range(self, data_arr, valid_range, scale_factor, add_offset):
"""Get and apply valid_range."""
if valid_range is not None:
valid_min, valid_max = valid_range
valid_min = self._scale_data(valid_min, scale_factor, add_offset)
valid_max = self._scale_data(valid_max, scale_factor, add_offset)

if valid_min is not None and valid_max is not None:
data_arr = data_arr.where((data_arr >= valid_min) &
(data_arr <= valid_max))
return data_arr, attrs
return data_arr

def apply_attributes(self, data, ds_info):
"""Combine attributes from file and yaml and apply.

File attributes should take precedence over yaml if both are present

"""
try:
global_attr_fill = self.nc.missing_value
except AttributeError:
global_attr_fill = 1.0

# let file metadata take precedence over ds_info from yaml,
# but if yaml has more to offer, include it here, but fix
# units.
ds_info.update(data.attrs)

scale = ds_info.pop('scale_factor', 1.0)
offset = ds_info.pop('add_offset', 0.)
fill_value = ds_info.pop("_FillValue", global_attr_fill)
valid_range = ds_info.pop('valid_range', None)

units_convert = {"Kelvin": "K"}
data_unit = ds_info['units']
ds_info['units'] = units_convert.get(data_unit, data_unit)

data = self._scale_data(data, scale, offset)
data = self._fill_data(data, fill_value, scale, offset)
data = self._apply_valid_range(data, valid_range, scale, offset)

return data, ds_info

def get_dataset(self, ds_id, ds_info):
"""Get datasets."""
if 'dependencies' in ds_info.keys():
idx = ds_info['channel_index']
data = self['BT']
data, ds_info = self.apply_attributes(data, ds_info)
data = data.rename(new_name_or_name_dict=ds_info["name"])

if self.sensor.lower() == "atms" and self.limb_correction:
Expand All @@ -385,19 +411,25 @@ def get_dataset(self, ds_id, ds_info):
data = data[:, :, idx]
else:
data = self[ds_id['name']]
data, ds_info = self.apply_attributes(data, ds_info)

data.attrs = self.update_metadata(ds_info)

data.attrs = self.get_metadata(ds_info)
return data

def _available_if_this_file_type(self, configured_datasets):
handled_vars = set()
for is_avail, ds_info in (configured_datasets or []):
if is_avail is not None:
# some other file handler said it has this dataset
# we don't know any more information than the previous
# file handler so let's yield early
yield is_avail, ds_info
continue
if self.file_type_matches(ds_info['file_type']):
djhoese marked this conversation as resolved.
Show resolved Hide resolved
handled_vars.add(ds_info['name'])
yield self.file_type_matches(ds_info['file_type']), ds_info
yield from self._available_new_datasets(handled_vars)

def _count_channel_repeat_number(self):
"""Count channel/polarization pair repetition."""
Expand Down Expand Up @@ -433,6 +465,7 @@ def _available_btemp_datasets(self):
'name': new_name,
'description': desc_bt,
'units': 'K',
'scale_factor': self.nc['BT'].attrs['scale_factor'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only scale_factor and not add_offset?

'channel_index': idx,
'frequency': "{}GHz".format(normal_f),
'polarization': normal_p,
Expand All @@ -447,20 +480,19 @@ def _get_ds_info_for_data_arr(self, var_name):
'name': var_name,
'coordinates': ["longitude", "latitude"]
}

if var_name in ["longitude", "latitude"]:
ds_info['standard_name'] = var_name
return ds_info

def _is_2d_yx_data_array(self, data_arr):
has_y_dim = data_arr.dims[0] == "y"
has_x_dim = data_arr.dims[1] == "x"
return has_y_dim and has_x_dim

def _available_new_datasets(self):
def _available_new_datasets(self, handled_vars):
"""Metadata for available variables other than BT."""
possible_vars = list(self.nc.items()) + list(self.nc.coords.items())
for var_name, data_arr in possible_vars:
if var_name in handled_vars:
continue
if data_arr.ndim != 2:
# we don't currently handle non-2D variables
continue
Expand All @@ -479,7 +511,6 @@ def available_datasets(self, configured_datasets=None):

"""
yield from self._available_if_this_file_type(configured_datasets)
yield from self._available_new_datasets()
yield from self._available_btemp_datasets()

def __getitem__(self, item):
Expand All @@ -491,10 +522,6 @@ def __getitem__(self, item):

"""
data = self.nc[item]
attrs = data.attrs.copy()
data, attrs = self._scale_data(data, attrs)
data, attrs = self._fill_data(data, attrs)
data, attrs = self._apply_valid_range(data, attrs)

# 'Freq' dimension causes issues in other processing
if 'Freq' in data.coords:
Expand Down
32 changes: 19 additions & 13 deletions satpy/tests/reader_tests/test_mirs.py
Expand Up @@ -25,12 +25,12 @@
import numpy as np
import xarray as xr

AWIPS_FILE = "IMG_SX.M2.D17037.S1601.E1607.B0000001.WE.HR.ORB.nc"
METOP_FILE = "IMG_SX.M2.D17037.S1601.E1607.B0000001.WE.HR.ORB.nc"
NPP_MIRS_L2_SWATH = "NPR-MIRS-IMG_v11r6_npp_s201702061601000_e201702061607000_c202012201658410.nc"
N20_MIRS_L2_SWATH = "NPR-MIRS-IMG_v11r4_n20_s201702061601000_e201702061607000_c202012201658410.nc"
OTHER_MIRS_L2_SWATH = "NPR-MIRS-IMG_v11r4_gpm_s201702061601000_e201702061607000_c202010080001310.nc"

EXAMPLE_FILES = [AWIPS_FILE, NPP_MIRS_L2_SWATH, OTHER_MIRS_L2_SWATH]
EXAMPLE_FILES = [METOP_FILE, NPP_MIRS_L2_SWATH, OTHER_MIRS_L2_SWATH]

N_CHANNEL = 3
N_FOV = 96
Expand All @@ -50,6 +50,8 @@
DS_IDS = ['RR', 'longitude', 'latitude']
TEST_VARS = ['btemp_88v1', 'btemp_88v2',
'btemp_22h', 'RR', 'Sfc_type']
DEFAULT_UNITS = {'btemp_88v1': 'K', 'btemp_88v2': 'K',
'btemp_22h': 'K', 'RR': 'mm/hr', 'Sfc_type': "1"}
PLATFORM = {"M2": "metop-a", "NPP": "npp", "GPM": "gpm"}
SENSOR = {"m2": "amsu-mhs", "npp": "atms", "gpm": "GPI"}

Expand Down Expand Up @@ -132,7 +134,7 @@ def _get_datasets_with_attributes(**kwargs):

attrs = {'missing_value': -999.}
ds = xr.Dataset(ds_vars, attrs=attrs)

ds = ds.assign_coords({"Freq": FREQ, "Latitude": latitude, "Longitude": longitude})
return ds


Expand Down Expand Up @@ -173,13 +175,13 @@ def _get_datasets_with_less_attributes():

attrs = {'missing_value': -999.}
ds = xr.Dataset(ds_vars, attrs=attrs)

ds = ds.assign_coords({"Freq": FREQ, "Latitude": latitude, "Longitude": longitude})
return ds


def fake_open_dataset(filename, **kwargs):
"""Create a Dataset similar to reading an actual file with xarray.open_dataset."""
if filename == AWIPS_FILE:
if filename == METOP_FILE:
return _get_datasets_with_less_attributes()
return _get_datasets_with_attributes()

Expand All @@ -197,7 +199,7 @@ def setup_method(self):
@pytest.mark.parametrize(
("filenames", "expected_loadables"),
[
([AWIPS_FILE], 1),
([METOP_FILE], 1),
([NPP_MIRS_L2_SWATH], 1),
([OTHER_MIRS_L2_SWATH], 1),
]
Expand All @@ -217,7 +219,7 @@ def test_reader_creation(self, filenames, expected_loadables):
@pytest.mark.parametrize(
("filenames", "expected_datasets"),
[
([AWIPS_FILE], DS_IDS),
([METOP_FILE], DS_IDS),
([NPP_MIRS_L2_SWATH], DS_IDS),
([OTHER_MIRS_L2_SWATH], DS_IDS),
]
Expand Down Expand Up @@ -254,6 +256,11 @@ def _check_valid_range(data_arr, test_valid_range):
assert data_arr.data.min() >= test_valid_range[0]
assert data_arr.data.max() <= test_valid_range[1]

@staticmethod
def _check_fill_value(data_arr, test_fill_value):
assert '_FillValue' not in data_arr.attrs
assert test_fill_value not in data_arr.data

@staticmethod
def _check_attrs(data_arr, platform_name):
attrs = data_arr.attrs
Expand All @@ -266,12 +273,7 @@ def _check_attrs(data_arr, platform_name):
@pytest.mark.parametrize(
("filenames", "loadable_ids", "platform_name"),
[
([AWIPS_FILE], TEST_VARS, "metop-a"),
([NPP_MIRS_L2_SWATH], TEST_VARS, "npp"),
([N20_MIRS_L2_SWATH], TEST_VARS, "noaa-20"),
([OTHER_MIRS_L2_SWATH], TEST_VARS, "gpm"),

([AWIPS_FILE], TEST_VARS, "metop-a"),
([METOP_FILE], TEST_VARS, "metop-a"),
([NPP_MIRS_L2_SWATH], TEST_VARS, "npp"),
([N20_MIRS_L2_SWATH], TEST_VARS, "noaa-20"),
([OTHER_MIRS_L2_SWATH], TEST_VARS, "gpm"),
Expand Down Expand Up @@ -306,9 +308,13 @@ def test_basic_load(self, filenames, loadable_ids,
if "valid_range" in input_fake_data.attrs:
valid_range = input_fake_data.attrs["valid_range"]
self._check_valid_range(data_arr, valid_range)
if "_FillValue" in input_fake_data.attrs:
fill_value = input_fake_data.attrs["_FillValue"]
self._check_fill_value(data_arr, fill_value)

sensor = data_arr.attrs['sensor']
if reader_kw.get('limb_correction', True) and sensor == 'atms':
fd.assert_called()
else:
fd.assert_not_called()
assert data_arr.attrs['units'] == DEFAULT_UNITS[var_name]