New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add weighted blended stacking to MultiScene (fixes multi-band handling) #2394
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2394 +/- ##
==========================================
+ Coverage 94.71% 94.76% +0.05%
==========================================
Files 329 337 +8
Lines 48739 48922 +183
==========================================
+ Hits 46161 46359 +198
+ Misses 2578 2563 -15
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 38 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@mraspaud @djhoese @adybbroe so far so good. I cannot help with the 600 line limit. Ans YES this was complex (for me). Regarding "code duplication": In my first (and closed) PR 2393 the problem was if if nesting stuff in two functions. Now I took those functions apart to four to flatten nesting and the new complaint is about code duplication. As I already said: This is all at the edge of my knowledge. I can certainly not help should more multiscene unit tests have to be added. I need your help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lobsiger thanks a lot for adding this feature, it is very nice results you get!
I have just started with the review, but maybe we can start with that?
There is a lot of places which are flagged as not tested indeed. I understand that Xarray can be a bit frightening, how about starting with tests using regular numpy arrays first, that we can then adapt to Xarray?
satpy/multiscene.py
Outdated
@@ -46,18 +46,29 @@ | |||
log = logging.getLogger(__name__) | |||
|
|||
|
|||
def stack(datasets, weights=None, combine_times=True): | |||
"""Overlay a series of datasets together. | |||
def stack(datasets, weights=None, combine_times=True, blend_type=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pythonic way to do this is to pass a string rather than a number for the blend type, eg "stack_no_weights", "select_from_highest_weight", "blend_with_weights".
However, I wonder if it would be clearer or cleaner to have separate methods for this, so that instead of calling eg
multiscene.stack(datasets, weights=five_kilos, blend_type="select_with_weights")
we could just call
multiscene.select(datasets, weights=five_kilos)
?
satpy/multiscene.py
Outdated
total = weights[0].copy() + 1.e-9 | ||
for n in range(1, len(weights)): | ||
total += weights[n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total = weights[0].copy() + 1.e-9 | |
for n in range(1, len(weights)): | |
total += weights[n] | |
total = sum(weights) |
the sum function even has a "start" argument, where you could put the offset you have here, like total = sum(weights, start=1.e-9)
However, I think this is not necessary. A division by zero later on will result in a NaN value, which will then be replaced with the fillna
call.
satpy/multiscene.py
Outdated
for n in range(1, len(weights)): | ||
total += weights[n] | ||
|
||
datasets0 = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this variable should be renamed to something more understandable...
satpy/multiscene.py
Outdated
for n in range(0, len(datasets)): | ||
weights[n] /= total | ||
datasets0.append(datasets[n].fillna(0)) | ||
datasets0[n][0] *= weights[n] | ||
datasets0[n][1] *= weights[n] | ||
datasets0[n][2] *= weights[n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for n in range(0, len(datasets)): | |
weights[n] /= total | |
datasets0.append(datasets[n].fillna(0)) | |
datasets0[n][0] *= weights[n] | |
datasets0[n][1] *= weights[n] | |
datasets0[n][2] *= weights[n] | |
for weight, dataset in zip(weights, datasets): | |
weight /= total | |
datasets0.append(dataset * weight) |
I think the array broadcasting mechanism should take care of the weight multiplication. If that works, it would probably mean that we can merge the single and mulitple band cases into one function, right?
Regarding the fillna
call, I think we should be careful here about what we use as a fill value. In particular, it's probably good to check if the dataset as a defined _FillValue
attribute already.
satpy/multiscene.py
Outdated
base = datasets0[0].copy() | ||
for n in range(1, len(datasets0)): | ||
base += datasets0[n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base = datasets0[0].copy() | |
for n in range(1, len(datasets0)): | |
base += datasets0[n] | |
base = sum(datasets0) |
@mraspaud thanks a lot for taking the time to look into my code. Be aware that I started with Python 2018 and I normally do my hobby programming in C the way I did in the 1970es. I have never coded a Python class so far and just had a first glance at numpy and xarray. So my results might look good but the code is mostly try and error and shows my missing knowledge in Python (I just yesterday understood enumerate()). I also have to confess that I do not understand all the (impressive) tests and that the way I understand and use git is a botch. So feel free to hack my code to bits. All I can add is that with using my weights w, ww (AzB), www or wwww you can tighten the blending zone probably with Adam's selection images as limiting case. |
No problem @lobsiger, we all have our backgrounds and I understand that you come from the C world, and that's perfectly fine. Hence my comments in the review regarding how to do things best in python and numpy/xarray. Feel free to check them out, python can be a fantastic tool once you tame it :) As for the tests, I was thinking something as simple as a creating two small weight arrays and two small datasets by hand, and ensuring that the result is as you expect. weights_1 = np.array([[0, 1, 2], [0, 1, 2], [0, 1, 2]])
weights_2 = np.array([[1.1, 1.1, 1.1], [1.1, 1.1, 1.1], [1.1, 1.1, 1.1]])
dataset_1 = np.full((3, 3), 1)
dataset_2 = np.full((3, 3), 2)
expected = np.array([[2, 2, 1], [2, 2, 1], [2, 2, 1]])
result = stack([dataset_1, dataset_2], [weights_1, weights_2]])
np.testing.assert_array_equal(result, expected) Then you can do the same with the other variations of stack, blend, select. |
@mraspaud why are the tests cancelled? Did I press some wrong button or is it just some overload of github? " |
The rest of the tests were cancelled after the first one crashed. You can see the failure messages for this run here: https://github.com/pytroll/satpy/actions/runs/4234069951/jobs/7355886611#step:10:191 It seems the expected and actual result matrices are a bit different, and |
@mraspaud quite sure it will fail again. Is this because a test of Adam for selected weighting is used that was designed for his 'ct' , 'cma' integer data only and had no coords=coords that I must add for datasets with bands? This is completely over my head ... |
@pnuu yes I think this is Adam's test for 'ct' data. It worked when I had two functions for selected weighting, one for single channel data that probably passed Adam's test and one for data with bands where no test existed yet? Not 100% sure what's going on though ... |
Yeah, sorry, haven't really followed this PR so don't know what's going on either 😅 The existing tests shouldn't start failing on new PRs, which would mean that the current behavior is changed. Unless that's the purpose, this shouldn't happen, and if this is the purpose, these tests should be updated. @adybbroe could you check this? |
I'm stuck. |
I'm really swamped this week, but I will try to look at this when I get a chance. So far the related mailing list thread and this pull request have moved too fast for the available time I have so I'm very behind. I'll need to find time to read some of the conversation. |
@djhoese thanks a lot for your time. In a nutshell: Adam started this as PR2275 for his blending with weights 'ct' and 'cma' type files for GEO and LEO satellites up north. I wanted to use it for multi pass RGB composites of LEO satellites and had a longer monologue on the google list. I made PR2093 messed it up, closed it and reopened this PR2094 with a lot of try and error. At last most of the tests seemed to pass but there were complaints about too many code lines, too much complexity and untested function lines. @mraspaud made a couple of very useful suggestions that enabled me to write the much cleaner code we have now. I think that this is the first time Adam's stacked multiscene with weights unit tests are invoked and of course fail as they are not designed for what I do here. I cannot be of further help regarding these unit tests as all that code is well over my head. |
@mraspaud @djhoese @pnuu I think the main remaining problem with codecov/patch is the renaming/replacing of Adam's stacking with weights function _stack_weighted() with the new name _stack_selected(), the addition of _stack_blended() and my inability to combine set_weights_to_zero_where_invalid() with the case with bands set_weights_to_zero_where_invalid_red(), where I only check the 'R' band. I have described the problem and asked for help on the google February 6th 2023 list: https://groups.google.com/g/pytroll/c/N20hwYVYI2s/m/t3qjXxyUAgAJ An of course the main problem is my kind of PR that replaces just too much at a time and thus is also hard to follow :-. |
satpy/multiscene.py
Outdated
indices = da.argmax(da.dstack(weights), axis=-1) | ||
if "bands" in datasets[0].dims: | ||
indices = [indices] * datasets[0].sizes["bands"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lobsiger I refactored this so that regardless of the number of bands in the DataArray (L, LA, RGB, RGBA) it would have indices for each one which I thought was what you wanted here. I'm realizing now, should this ignore alpha bands?
I'll try to work on this PR more tonight or maybe this weekend if I find the time. My plan is to add tests and get good coverage then refactor the multiscene into a subpackage with something like:
and maybe I'll separate the animation and save_datasets handling into a separate module too. The point is, I'll have at least a couple more commits to do here. |
@djhoese thanks for looking into this. I have been pretty much absorbed for days testing your overlay cache improvements. I refactored this so that regardless of the number of bands in the DataArray (L, LA, RGB, RGBA) it would have indices for each one which I thought was what you wanted here. I'm realizing now, should this ignore alpha bands? @djhoese TBH I was thinking about RGBA but had no clue under what condition I would get such or even other kinds of bands. IIRC all my LEO satellites worked and I found only RGB data. The NWC cloud data seems still be changing after the move of the compositors. But of course this PR should (if possible at all) finally work for all sorts of bands that we get in a resampled scene. |
satpy/multiscene.py
Outdated
@@ -46,54 +48,99 @@ | |||
log = logging.getLogger(__name__) | |||
|
|||
|
|||
def stack(datasets, weights=None, combine_times=True): | |||
"""Overlay a series of datasets together. | |||
def stack(datasets, weights=None, combine_times=True, blend_type='select_with_weights'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going through this code and trying to consolidate logic. I'm having trouble with this combine_times
argument from @adybbroe's PR. @adybbroe any memory of why you/we had this combine_times
keyword argument instead of just always doing it? I think @gerritholl had filed a bug report about regular stacking not combining the times already. I think I'll need to at least make the "no weights" case use this.
satpy/multiscene.py
Outdated
|
||
indices = da.argmax(da.dstack(weights), axis=-1) | ||
attrs = combine_metadata(*[x.attrs for x in datasets]) | ||
base = sum(overlays) / sum(weights, start=1.e-9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lobsiger I'm not sure I agree with this start
parameter as a way to handle dividing by 0. What do we think the value should be if weights result to 0
?
There are different ways to handle a divide by 0, but I'm kind of leaning towards letting it be NaN in the final result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djhoese I agree that we should not have start=1e-6 when summing the weights in the denominator.
Instead we should divide by 0 outside the data area of passes, probably producing 0/0=None (?).
I took the latest version of your multiscene.py and tested all possible combinations of three
MetopB passes over area eurol10 similar to what I did here:
I made images for composites 'natural_color', 'ir108_3d' and channel '4' (IR 10.8, used for 'ir108_3d').
I allowed for all 3 defined blend types, generate=True/False, fill_value=0/255/None. This resulted in
3 x 3 x 2 x 3 = 54 different image files without setting start in the denominator. All these images
looked as expected. Then I set sum(weights, start=1e-6) and made 18 more images for 'blend_with_weights'.
All these 18 files are problematic, not discovered so far because I mainly looked at RGB composites
with fill_value=0. I attach 3 examples left is image as expected, right is wrong image with start=1e-6.
Original files produced are *.png, but I reduced them in size and changed them to *.jpg to save space.
Problems with 'blend_with_weights', sum(weights, start=1e-6), generate=False/True does not matter:
'4' Data Pale (IR 10.8, almost white) fill_value0 No_Data region is BLACK (O.K., but caused by 1e-6)
'4' Data Pale (IR 10.8, almost white) fill_value255 No_Data region is BLACK (instead of white)
'4' Data Pale (IR 10.8, almost white) fill_valueNone No_Data region is BLACK (instead of transparent)
'ir108_3d' Data Dark (almost black) fill_value0 No_Data region is WHITE (instead of black, reversed)
'ir108_3d' Data Dark (almost black) fill_value255 No_Data region is WHITE (O.K., but caused by 1e-6)
'ir108_3d' Data Dark (almost black) fill_valueNone No_Data region is WHITE (instead of transparent)
'natural_color' Data look as expected fill_value0 No_Data region is BLACK (O.K., but caused by 1e-6)
'natural_color' Data look as expected fill_value255 No_Data region is BLACK (instead of white)
'natural_color' Data look as expected fill_valueNone No_Data region is BLACK (instead of transparent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Divide by zero with numpy will generally produce NaNs. In Satpy/trollimage these are considered fill values so they either become transparent via the Alpha band or they get set with your fill_value keyword argument. Based on your comment I think this is what you're seeing. For the start value sum cases, yeah I don't see how that would work properly in real world cases. So I'm not concerned. I think I can continue with the rest of my TODO list then.
Ok remaining TODO:
|
New question @lobsiger: If you provide an RGB DataArray, do your weights have to also have 3 bands? Or can they be just 2? Theoretically, your weights could be solar/sensor angle based or are specific to the AreaDefinition so they really don't care about per-band (R, G, or B) conditions. Right? |
@djhoese weights is provided by the user as a list of 2d (y,x) arrays that have the shape of the area Meaningful weights for LEO passes are not defined area specific but rather instrument swath https://groups.google.com/g/pytroll/c/N20hwYVYI2s/m/IOKXxZXcAwAJ In Adam's code -- that is now 'select_with_weights' -- very different definition of weights If your question is only for building test cases you can probably directly provide weights as list of (y,x) arrays with P.S. Regarding latest multiscene.py: In my 'blend_with_weights' code there is one place where sum(weights) is broadcasted (if "bands") in the line
and there is two lines in 'select_with_weights' because broadcasting indices did not work with Adam's code
|
@djhoese first of all I want to thank you very much for bringing this PR to the point where we are now!
could be slightly enhanced. Adam's ancestor #2275 started with cloud data and while this PR has I also want to thank @mraspaud for his encouragement and for telling the old dog some new tricks. |
Thanks @lobsiger. Yeah, I'll try updating that docstring. And yeah I agree, we should avoid testing this PR and this functionality in general with the NWCSAF products until we're sure they're working properly. |
For the docstring, what did you have in mind? You think I should mention RGBA images? |
@djhoese here you made it also work for the LA and RGBA cases as I think to understand the A band is treated as every other band. Image: MetopB-ir_cloud_day-eurol-blend_with_weights-generateTrue |
Ok docstring has been updated, but I realized to make the API docs (on the HTML docs) show this new multiscene structure I needed to tell sphinx/api-docs to generate API docs for private members (things prefixed with @mraspaud thoughts? I think this PR is ready for review and merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for reorganising the code!
Multiscene blending revisited
This is a second try after my PR 2393 went wrong.
The code of PR 2275 appeared first in Satpy 0.40.
I was mislead by its titel as this was actually
about selecting (choosing) pixels with weights.
This has been developed by Adam for combining 'ct'
data of GEO and LEO satellites up north. My use of the
code was for multipass images of LEOs with identical
sensors. I have enhanced PR 2275 to handle composites.
I added a function for true blending of overlapping
datasets. This also works with common RGB composites.
I propose this naming:
default)
multiscene "stacking"
(the original way, datasets put on top of each other)
multiscene "selecting" with weights
(pixel with highest weight is selected from one dataset)
multiscene "blending" with weights
(pixels blended, more than one dataset may contribute)
"Stacking" is the default. When working with multiscene and
a list with weights is provided the user has an additional
way to select case 1) or 2) (and maybe future options ...?).
This is handled by integer keyword argument 'blending_type'.
For 1) I propose to document/use the weighting function
w = 180 - satellite_zenith_angle
For 2) I propose a function similar to the flight track
dispersion formula as used in the German 'AzB'
Where new_mscn is the multiscene resampled to the area.
This will certainly not work for 'ct', 'cma' type data.
NOTE: I'm absolutely new to this xarray stuff and will
certainly not be able to write meaningful tests. I just
hope that Adam's tests for PR 2275 can be used somehow.
I attach 3 examples of MetopB multipass over 'eurol10'.
A single "channel" image of weights has been has been
produced at the same time (old busy Xeon, 24GB of RAM).
"stacked" was produced in 2 Minutes 23 Seconds
"selected" was produced in 9 Minutes 33 Seconds
"blended" was produced in 2 Minutes 16 Seconds
Cheers,
Ernst