Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add channel aliases to the CLAVRx reader to facilitate composites #2441

Merged
merged 38 commits into from
Feb 16, 2024

Conversation

joleenf
Copy link
Contributor

@joleenf joleenf commented Apr 7, 2023

This PR updates the clavrx reader with aliases for some visible channels to allow the use of the cimss cloud type composite and color bar (4-Level) for the cloud mask. Test updates have been made to verify that the aliases are being created correctly.

@djhoese
Copy link
Member

djhoese commented Apr 7, 2023

I wonder if maybe we could have a more descriptive title to this PR? 😜

@joleenf joleenf changed the title Merge branch 'main' of github.com:pytroll/satpy into clavrx_cloud_rgb Add channel aliases to the CLAVRx reader to facilitate composites Apr 7, 2023
… the need for a special lookup within reader

Add tests to make sure aliases are created.
@codecov
Copy link

codecov bot commented Apr 11, 2023

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (eb4ac0b) 95.40% compared to head (f6d1f1f) 95.91%.
Report is 38 commits behind head on main.

Files Patch % Lines
satpy/readers/clavrx.py 90.69% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2441      +/-   ##
==========================================
+ Coverage   95.40%   95.91%   +0.50%     
==========================================
  Files         371      373       +2     
  Lines       52825    52946     +121     
==========================================
+ Hits        50399    50782     +383     
+ Misses       2426     2164     -262     
Flag Coverage Δ
behaviourtests 4.15% <0.00%> (-0.01%) ⬇️
unittests 96.01% <97.82%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coveralls
Copy link

Coverage Status

Coverage: 95.342% (+0.002%) from 95.341% when pulling c6d0122 on joleenf:clavrx_cloud_rgb into 9875d16 on pytroll:main.

data = data.where((data >= valid_min) & (data <= valid_max), fill)
else:
if isinstance(valid_range, np.ndarray):
valid_min = _CLAVRxHelper._scale_data(valid_range[0], factor, offset)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the proper way to patch this _scale_data function and use an assert_called() test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easiest might be to extract _scale_data outside the class and then do the mocking. To do it "proper" (read: the way I would prefer) do:

from satpy.readers.clavrx import _scale_data
with mock.patch("satpy.readers.clavrx._scale_data", wraps=_scale_data) as scale_data:
    ...
scale_data.assert_called()

But overall it may be even easier if you just check the final result and make sure it has the scaled values you expect.

@coveralls
Copy link

coveralls commented May 30, 2023

Pull Request Test Coverage Report for Build 7933045894

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • -8 of 367 (97.82%) changed or added relevant lines in 5 files are covered.
  • 7 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.02%) to 95.992%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/readers/clavrx.py 78 86 90.7%
Files with Coverage Reduction New Missed Lines %
satpy/composites/init.py 1 94.83%
satpy/dataset/metadata.py 1 99.26%
satpy/tests/test_readers.py 1 99.36%
satpy/readers/init.py 4 98.65%
Totals Coverage Status
Change from base Build 7726219656: 0.02%
Covered Lines: 50654
Relevant Lines: 52769

💛 - Coveralls

… is important for AWIPS)

It also seems important that integer (flag data) does not have a valid range provided.
Add tests to flag this problem if it appears again.
@joleenf joleenf marked this pull request as ready for review June 5, 2023 20:20
@joleenf joleenf requested a review from mraspaud as a code owner June 5, 2023 20:20
@joleenf joleenf marked this pull request as draft June 13, 2023 18:59
…s behind

Merge branch 'main' of github.com:pytroll/satpy into main
Merge branch 'main' of github.com:pytroll/satpy into clavrx_cloud_rgb
Merge branch 'clavrx_cloud_rgb' of github.com:joleenf/satpy into clavrx_cloud_rgb
Check for a float32 because that is what satpy is expecting
Update valid_range to list if np.ndarray to fix type error in scale_data.
@joleenf joleenf marked this pull request as ready for review February 1, 2024 17:23
@joleenf
Copy link
Contributor Author

joleenf commented Feb 1, 2024

I am not sure I want to do anything about the CodeFactor fail. I am not certain there is anything I should do about the CI/test (3.11, ubuntu-latest) test.

@djhoese
Copy link
Member

djhoese commented Feb 2, 2024

Don't worry about the CI failure. The "true" at the end of that is that it is using experimental versions of dependencies. With numpy 2.0 and some other dependencies we know this is failing.

The CodeFactor thing I would like you to fix. It is complaining about the spacing you used. Let's remove any unnecessary spacing.

@joleenf joleenf requested a review from djhoese February 2, 2024 20:52
@djhoese djhoese added enhancement code enhancements, features, improvements component:readers labels Feb 5, 2024
@djhoese djhoese self-assigned this Feb 5, 2024
Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple suggestions, but otherwise this looks mostly good to me. I'm wondering also if the tests (which we may have talked about) should be in a separate test_clavrx/ directory or something just to group them together. That way it might be easier to have any shared logic in some kind of _shared.py or _utils.py or something. Just an idea.

@@ -94,9 +108,29 @@ def _get_rows_per_scan(sensor: str) -> Optional[int]:
return None


def _scale_data(data_arr: Union[xr.DataArray, int], scale_factor: float, add_offset: float) -> xr.DataArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not requesting a change here, but FYI if you do from __future__ import annotations at the top of the module you can then use xr.DataArray | int here instead of the Union.

res = resolution_from_filename_info
if res.endswith("m"):
return int(res[:-1])
elif res is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif is unnecessary. Not sure why the linter didn't complain about this. Maybe it isn't enabled on satpy or ruff isn't checking it yet.

Suggested change
elif res is not None:
if res is not None:

But also...if res can be None then won't the previous res.endswith fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It would.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than setting global attributes, is the nadir resolution used in remapping? I am wondering if it is okay if the nadir resolution is returned as none? There should be enough here to always have a result. I am just wondering if I should just let the reader fail if there is a None value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like self.resolution is used to fill in ds_info["resolution"] in the datasets. This becomes part of the DataArray.attrs but also part of the DataID that Satpy uses to identify each dataset/product. I think it is allowed to be None in most cases when it is unknown or not applicable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an odd mix. Not sure if I should also re-work how the resolution value is extracted from the string. Sometimes I get too fancy, but I was thinking, hypothetically, resolution could be a different unit system. It is highly unlikely, but not impossible.

Is this overboard?

def _get_nadir_resolution(sensor, resolution_from_filename_info):
        """Get nadir resolution."""
        for k, v in NADIR_RESOLUTION.items():
            if sensor.startswith(k):
                return v
        if resolution_from_filename_info is None:
            return None
        elif isinstance(resolution_from_filename_info, int):
            return int(resolution_from_filename_info)
        elif isinstance(resolution_from_filename_info, str):
            print(resolution_from_filename_info)
            result = ''.join([i for i in resolution_from_filename_info if i.isdigit()])
            return int(result)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little simpler:

    def _get_nadir_resolution(sensor, resolution_from_filename_info):
        """Get nadir resolution."""
        for k, v in NADIR_RESOLUTION.items():
            if sensor.startswith(k):
                return v
        if resolution_from_filename_info is None:
            return None
        if isinstance(resolution_from_filename_info, str):
            resolution_from_filename_info = ''.join([i for i in resolution_from_filename_info if i.isdigit()])
        return int(resolution_from_filename_info)

Right now Satpy doesn't support units other than meters in the resolution field...although it also doesn't use it beyond the user requesting a specific resolution. I'd say removing the "m" is probably good enough unless you have evidence that a "123 degrees" or something might exist. Oh I guess there could be "km" but in that case you need to actually convert that or specify the units when assigning the resolution in some way. So maybe it is best to let it fail automatically by only removing the "m" (if isinstance(resolution_from_filename_info, str) and resolution_from_filename_info.startswith("m"):)

"""Scale data, if needed."""
scaling_needed = not (scale_factor == 1.0 and add_offset == 0.0)
if scaling_needed:
data_arr = data_arr * scale_factor + add_offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you/we need np.float32(scale_factor) (and on the offset) to make sure 64-bit floats aren't created unnecessarily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

satpy/readers/clavrx.py Show resolved Hide resolved
proj,
ncols,
nlines,
np.asarray(area_extent))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this np.asarray was ever a thing, but area_extent should always be a tuple when passed to AreaDefinition.

Comment on lines 514 to 520
if matches and ds_info.get("resolution") != self.resolution:
# reader knows something about this dataset (file type matches)
# add any information that this reader can add.
new_info = ds_info.copy()
new_info["resolution"] = self.resolution
handled_vars.add(ds_info["name"])
yield self.file_type_matches(ds_info["file_type"]), ds_info
yield from self._available_new_datasets(handled_vars)
yield True, new_info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the resolution check here be checking for is None? What is the case where we, the file handler, are configured to handle this variable but the resolution is defined as something other than what we contain (currently resolution != self.resolution)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part has been tricky for me to get correct. The structure is essentially following Example 1. Are you suggesting:

matches = self.file_type_matches(ds_info["file_type"])
            if matches and ds_info.get("resolution") != self.resolution:
                # reader knows something about this dataset (file type matches)
                # add any information that this reader can add.
                new_info = ds_info.copy()
                if self.resolution is not None:
                    new_info["resolution"] = self.resolution
                handled_vars.add(ds_info["name"])
                yield True, new_info
        yield from self._available_file_datasets(handled_vars)

It has become a little complex trying to get the aliases to work within the available datasets, but that has nothing to do with the question of resolution, it just adds layers of yield statements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git diff is a little complicated now, but I kind of see the reasoning behind the way I made "Example 1". I think the idea is something like MODIS L1b files where each resolution is in a different file (250m, 500m, 1km). So in that case it is very likely that we could be looking at a ds_info that has a resolution different than our own and I suppose the file matches is just a "shortcut" condition to say "yeah, we definitely don't know how to handle this dataset".

I guess what I'm saying is leave it the way you have it if it works for you.

ncols,
nlines,
np.asarray(area_extent))
f"{sensor}_geos",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removes the type np.asarray as requested. Simplifies the description using f-strings.

@joleenf joleenf requested a review from djhoese February 13, 2024 17:30
proj,
ncols,
nlines,
area_extent)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resolved the np.asarray issue.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a /__init__.py in your new clavrx test directory. Maybe not because pytest will find it anyway, but we generally think of the test directories as part of the package so they need the init module.

@djhoese
Copy link
Member

djhoese commented Feb 16, 2024

I have a feeling ruff tried to change your apostrophes to double quotes when it thought they were single quotes. I'm hoping it will be OK with my latest commit.

@djhoese djhoese merged commit a1d667d into pytroll:main Feb 16, 2024
17 of 19 checks passed
@djhoese
Copy link
Member

djhoese commented Feb 16, 2024

Thanks for sticking with it. I figure at this point you are the owner of this code so if anything goes wrong you're likely to be the one to identify it and fix it. I tried to review what I could and what I understood. Nice job.

@joleenf
Copy link
Contributor Author

joleenf commented Feb 16, 2024

Thank-you Dave.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:readers enhancement code enhancements, features, improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants