New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to replace saturated MODIS L1b values with max valid value #2057
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2057 +/- ##
=======================================
Coverage 93.86% 93.86%
=======================================
Files 283 283
Lines 42328 42379 +51
=======================================
+ Hits 39730 39781 +51
Misses 2598 2598
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Turns out the uncertainty indexes are 15 in real world files where the pixels are saturated. Current solution is to not check uncertainty if not masking saturated pixels. |
@mraspaud I've been talking to Kathy and @simonrp84 (on slack) and we can't figure out how/why/if the uncertainty should be used. How do you feel about me removing them in favor of the more "traditional" fill value filtering that is/was already used. |
I'll be back at the office on Friday, I'll try to find time to talk to the original author of that test and see if he can remember why he did it in the first place. For now, deactivating the uncertainty check when the new option is set to true seems reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The principle is sound, I just have a couple of suggestions. Also, 65532 fill value seems to have some relation with saturation, should we use it?
Regarding 65532, Simon and I talked about that on slack. That is apparently related to the space view being saturated and is not useful here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR. It looks good on the functionality, I just have some style comments.
satpy/readers/modis_l1b.py
Outdated
array = xr.DataArray(from_sds(subdata, chunks=CHUNK_SIZE)[band_index, :, :], | ||
dims=['y', 'x']).astype(np.float32) | ||
valid_range = var_attrs['valid_range'] | ||
array = self._mask_invalid_or_fill_saturated(array, valid_range) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like having one function doing two things here, could we move the if self._mask_saturated
logic here and split that function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could, but it would result in duplicated code as the two options both check valid_min
...nevermind, re-reading it is clear that I could do the masking first as a separate method. I'll see what I can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated this out. I'm not sure if one placement of the if
statement is better than another. Let me know what you think.
satpy/readers/modis_l1b.py
Outdated
# lats=satscene[band].area.lats[indices, :]) | ||
self._add_satpy_metadata(key, projectable) | ||
return projectable | ||
return subdata, uncertainty, var_attrs, band_index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a lot of return values :) Would it be worth having a class for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or build the DataArray here and put eg uncertainty[band_index]
as an ancillary dataset in the attrs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't like this much either, but it wasn't obvious how to do it any other way. It overall seemed better than the huge method that was here before. My goal was to avoid letting datasets
or dataset
(the name of the variable in the file to use) leak to the outer scope, but maybe I make one method that returns dataset and band_index...nah because then I still have to self.sd.select
the dataset again and get the var_attrs.
The problem with making the DataArray here is that it actually doesn't get me much since var_attrs isn't actually applied to the DataArray at all. I'm not sure how I feel about the uncertainty indexes going in the attrs either. I'm leaning towards don't like it 😉
I'll think about doing a separate helper class or something while I work on other stuff today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so it required a tiny bit of duplicated logic (.select
), BUT I think it is overall cleaner. Check it out.
250: ['EV_250_RefSB'], | ||
} | ||
|
||
def __init__(self, filename, filename_info, filetype_info, mask_saturated=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the kwargs for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kwargs are actually needed if this reader (file handler) is loaded at the same time as other readers. This is an unfortunate side effect of how the Scene handles reader keyword arguments and we've done it in other readers. The Scene, when given a dictionary of keyword arguments, will pass them to all readers being loaded. If a file handler/reader doesn't have **kwargs
then any unrecognized keyword arguments will raise an exception.
Recently @gerritholl added the ability to specify the exact reader you want to pass keyword arguments to from the Scene __init__
so maybe this isn't required anymore, but I'd be nervous about deprecating it too fast. I could remove it for this reader with the understanding that I only need to handle mask_saturated
and the assumption that the base HDFEOS file handler has no other keyword arguments. I could then remove the **kwargs
that I added to the base HDFEOS file handler above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I looked at this more and I think we need this in the other file handlers. The base and geo file handlers in hdfeos.py
will also both receive the mask_saturated=True
keyword argument and won't know what to do with it. The easiest way to capture that and ignore it is to use **kwargs
.
Co-authored-by: Martin Raspaud <martin.raspaud@smhi.se>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
CI is still failing, I'm merging main |
I'll fix the failing tests as soon as I can. |
The issues identified by the code analyzers that are actually introduced by this PR aren't things that can easily be changed. Given the approval, I'm going to merge this now. |
As discussed on slack, the MODIS L1b files include a special fill value (65535) that indicates that the detector saturated. This value is only present for band 2. Additionally there is a fill value that means that the value could not be aggregated from the finer resolution version of the band to the 500m or 1km version. This "can't aggregate" value shows up when saturated pixels as well.
This PR adds a new "mask_saturated" kwarg to the
modis_l1b
reader that, by default, replaces these values with NaNs as it is now. IfFalse
, the saturated and can't aggregate fill values are replaced with the max valid value. The name of this kwarg matches the MSI reader (https://github.com/pytroll/satpy/blob/main/satpy/readers/msi_safe.py#L114) which has the same default and behavior.