New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add weighted blended stacking to MultiScene (fixes multi-band handling) #2394
Merged
Merged
Changes from 17 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
7f878f5
Rename Adam's stacking function, add set_weights_to_zero_where_invali…
lobsiger b374ff9
Adding selecting with bands and true blending.
lobsiger a7e480b
Fixed indenting stuff an line break.
lobsiger 5a3bca5
Fixed line break.
lobsiger 6bb4cdd
Cosmetics, maybe should use enumerate() ...
lobsiger c64d556
Adapted stack() for two blending functions.
lobsiger 9ff0f94
Made one blend function out of two.
lobsiger b0c0701
Made one select function out of two.
lobsiger 3e110b8
Added start because this is now after the .fillna() step.
lobsiger 20a7de9
Just a test to test my test theory.
lobsiger 1ddea03
Maybe Adams test was not invoked when all passed in my homebrew versi…
lobsiger e19d301
Got my first idea of an assert statement.
lobsiger 7ec0057
Reword stack docstring in satpy/multiscene.py
djhoese 4d4dcb8
Start refactoring new weighted stacking in MultiScene
djhoese a655318
Refactor multiscene blend tests to avoid unnecessary test setup
djhoese 54797c0
Improve consistency between multiscene stack functions
djhoese 0210ae4
Consolidate some multiscene blend tests
djhoese 15d8d0c
Add initial tests for weighted blended stacking
djhoese f2ac7b2
Refactor multiscene blending fixtures
djhoese f253d61
Add RGB and float tests to multiscene blend tests
djhoese c6d8dea
Remove TODOs from multiscene regarding overlay/weight handling
djhoese c969ce7
Move multiscene to its own subpackage
djhoese 53d7c22
Refactor multiscene blend functions to their own module
djhoese 918260e
Make more objects in multiscene module private with `_` prefix
djhoese 4ad2be8
Update docstring of multiscene stack and fix docstring errors in priv…
djhoese File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# Copyright (c) 2016-2023 Satpy developers | ||
# | ||
|
@@ -16,13 +16,15 @@ | |
# You should have received a copy of the GNU General Public License along with | ||
# satpy. If not, see <http://www.gnu.org/licenses/>. | ||
"""MultiScene object to work with multiple timesteps of satellite data.""" | ||
from __future__ import annotations | ||
|
||
import copy | ||
import logging | ||
import warnings | ||
from datetime import datetime | ||
from queue import Queue | ||
from threading import Thread | ||
from typing import Callable, Iterable, Mapping, Optional, Sequence | ||
|
||
import dask.array as da | ||
import numpy as np | ||
|
@@ -46,66 +48,140 @@ | |
log = logging.getLogger(__name__) | ||
|
||
|
||
def stack(datasets, weights=None, combine_times=True): | ||
"""Overlay a series of datasets together. | ||
def stack( | ||
datasets: Sequence[xr.DataArray], | ||
weights: Optional[Sequence[xr.DataArray]] = None, | ||
combine_times: bool = True, | ||
blend_type: str = 'select_with_weights' | ||
) -> xr.DataArray: | ||
"""Combine a series of datasets in different ways. | ||
|
||
By default, datasets are stacked on top of each other, so the last one applied is | ||
on top. If a sequence of weights arrays are provided the datasets will | ||
be combined according to those weights. The result will be a composite | ||
dataset where the data in each pixel is coming from the dataset having the | ||
highest weight. | ||
on top. If a sequence of weights (with equal shape) is provided, the datasets will | ||
be combined according to those weights. Datasets can be integer category products | ||
(ex. cloud type), single channels (ex. radiance), or RGB composites. In the | ||
latter case, weights is applied | ||
to each 'R', 'G', 'B' coordinate in the same way. The result will be a composite | ||
dataset where each pixel is constructed in a way depending on ``blend_type``. | ||
|
||
""" | ||
if weights: | ||
return _stack_weighted(datasets, weights, combine_times) | ||
|
||
base = datasets[0].copy() | ||
for dataset in datasets[1:]: | ||
return _stack_with_weights(datasets, weights, combine_times, blend_type) | ||
return _stack_no_weights(datasets, combine_times) | ||
|
||
|
||
def _stack_with_weights( | ||
datasets: Sequence[xr.DataArray], | ||
weights: Sequence[xr.DataArray], | ||
combine_times: bool, | ||
blend_type: str | ||
) -> xr.DataArray: | ||
blend_func = _get_weighted_blending_func(blend_type) | ||
filled_weights = list(_fill_weights_for_invalid_dataset_pixels(datasets, weights)) | ||
return blend_func(datasets, filled_weights, combine_times) | ||
|
||
|
||
def _get_weighted_blending_func(blend_type: str) -> Callable: | ||
WEIGHTED_BLENDING_FUNCS = { | ||
"select_with_weights": _stack_select_by_weights, | ||
"blend_with_weights": _stack_blend_by_weights, | ||
} | ||
blend_func = WEIGHTED_BLENDING_FUNCS.get(blend_type) | ||
if blend_func is None: | ||
raise ValueError(f"Unknown weighted blending type: {blend_type}." | ||
f"Expected one of: {WEIGHTED_BLENDING_FUNCS.keys()}") | ||
return blend_func | ||
|
||
|
||
def _fill_weights_for_invalid_dataset_pixels( | ||
datasets: Sequence[xr.DataArray], | ||
weights: Sequence[xr.DataArray] | ||
) -> Iterable[xr.DataArray]: | ||
"""Replace weight valus with 0 where data values are invalid/null.""" | ||
has_bands_dims = "bands" in datasets[0].dims | ||
for i, dataset in enumerate(datasets): | ||
# if multi-band only use the red-band | ||
compare_ds = dataset[0] if has_bands_dims else dataset | ||
try: | ||
base = base.where(dataset == dataset.attrs["_FillValue"], dataset) | ||
yield xr.where(compare_ds == compare_ds.attrs["_FillValue"], 0, weights[i]) | ||
except KeyError: | ||
base = base.where(dataset.isnull(), dataset) | ||
yield xr.where(compare_ds.isnull(), 0, weights[i]) | ||
|
||
return base | ||
|
||
def _stack_blend_by_weights( | ||
datasets: Sequence[xr.DataArray], | ||
weights: Sequence[xr.DataArray], | ||
combine_times: bool | ||
) -> xr.DataArray: | ||
"""Stack datasets blending overlap using weights.""" | ||
attrs = _combine_stacked_attrs([data_arr.attrs for data_arr in datasets], combine_times) | ||
|
||
def _stack_weighted(datasets, weights, combine_times): | ||
"""Stack datasets using weights.""" | ||
weights = set_weights_to_zero_where_invalid(datasets, weights) | ||
overlays = [] | ||
for weight, overlay in zip(weights, datasets): | ||
overlays.append(overlay.fillna(0) * weight) | ||
|
||
indices = da.argmax(da.dstack(weights), axis=-1) | ||
attrs = combine_metadata(*[x.attrs for x in datasets]) | ||
base = sum(overlays) / sum(weights, start=1.e-9) | ||
|
||
dims = datasets[0].dims | ||
blended_array = xr.DataArray(base, dims=dims, attrs=attrs) | ||
return blended_array | ||
|
||
if combine_times: | ||
if 'start_time' in attrs and 'end_time' in attrs: | ||
attrs['start_time'], attrs['end_time'] = _get_combined_start_end_times(*[x.attrs for x in datasets]) | ||
|
||
def _stack_select_by_weights( | ||
datasets: Sequence[xr.DataArray], | ||
weights: Sequence[xr.DataArray], | ||
combine_times: bool | ||
) -> xr.DataArray: | ||
"""Stack datasets selecting pixels using weights.""" | ||
indices = da.argmax(da.dstack(weights), axis=-1) | ||
if "bands" in datasets[0].dims: | ||
indices = [indices] * datasets[0].sizes["bands"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lobsiger I refactored this so that regardless of the number of bands in the DataArray (L, LA, RGB, RGBA) it would have indices for each one which I thought was what you wanted here. I'm realizing now, should this ignore alpha bands? |
||
|
||
attrs = _combine_stacked_attrs([data_arr.attrs for data_arr in datasets], combine_times) | ||
dims = datasets[0].dims | ||
weighted_array = xr.DataArray(da.choose(indices, datasets), dims=dims, attrs=attrs) | ||
return weighted_array | ||
coords = datasets[0].coords | ||
selected_array = xr.DataArray(da.choose(indices, datasets), dims=dims, coords=coords, attrs=attrs) | ||
return selected_array | ||
|
||
|
||
def set_weights_to_zero_where_invalid(datasets, weights): | ||
"""Go through the weights and set to pixel values to zero where corresponding datasets are invalid.""" | ||
for i, dataset in enumerate(datasets): | ||
def _stack_no_weights( | ||
datasets: Sequence[xr.DataArray], | ||
combine_times: bool | ||
) -> xr.DataArray: | ||
base = datasets[0].copy() | ||
collected_attrs = [base.attrs] | ||
for data_arr in datasets[1:]: | ||
collected_attrs.append(data_arr.attrs) | ||
try: | ||
weights[i] = xr.where(dataset == dataset.attrs["_FillValue"], 0, weights[i]) | ||
base = base.where(data_arr == data_arr.attrs["_FillValue"], data_arr) | ||
except KeyError: | ||
weights[i] = xr.where(dataset.isnull(), 0, weights[i]) | ||
base = base.where(data_arr.isnull(), data_arr) | ||
|
||
return weights | ||
attrs = _combine_stacked_attrs(collected_attrs, combine_times) | ||
base.attrs = attrs | ||
return base | ||
|
||
|
||
def _get_combined_start_end_times(*metadata_objects): | ||
def _combine_stacked_attrs(collected_attrs: Sequence[Mapping], combine_times: bool) -> dict: | ||
attrs = combine_metadata(*collected_attrs) | ||
if combine_times and ('start_time' in attrs or 'end_time' in attrs): | ||
new_start, new_end = _get_combined_start_end_times(collected_attrs) | ||
if new_start: | ||
attrs["start_time"] = new_start | ||
if new_end: | ||
attrs["end_time"] = new_end | ||
return attrs | ||
|
||
|
||
def _get_combined_start_end_times(metadata_objects: Iterable[Mapping]) -> tuple[datetime | None, datetime | None]: | ||
"""Get the start and end times attributes valid for the entire dataset series.""" | ||
start_time = datetime.now() | ||
end_time = datetime.fromtimestamp(0) | ||
start_time = None | ||
end_time = None | ||
for md_obj in metadata_objects: | ||
if md_obj['start_time'] < start_time: | ||
if "start_time" in md_obj and (start_time is None or md_obj['start_time'] < start_time): | ||
start_time = md_obj['start_time'] | ||
if md_obj['end_time'] > end_time: | ||
if "end_time" in md_obj and (end_time is None or md_obj['end_time'] > end_time): | ||
end_time = md_obj['end_time'] | ||
|
||
return start_time, end_time | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lobsiger I'm not sure I agree with this
start
parameter as a way to handle dividing by 0. What do we think the value should be if weights result to0
?There are different ways to handle a divide by 0, but I'm kind of leaning towards letting it be NaN in the final result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djhoese I agree that we should not have start=1e-6 when summing the weights in the denominator.
Instead we should divide by 0 outside the data area of passes, probably producing 0/0=None (?).
I took the latest version of your multiscene.py and tested all possible combinations of three
MetopB passes over area eurol10 similar to what I did here:
pytroll/pycoast#95 (comment)
I made images for composites 'natural_color', 'ir108_3d' and channel '4' (IR 10.8, used for 'ir108_3d').
I allowed for all 3 defined blend types, generate=True/False, fill_value=0/255/None. This resulted in
3 x 3 x 2 x 3 = 54 different image files without setting start in the denominator. All these images
looked as expected. Then I set sum(weights, start=1e-6) and made 18 more images for 'blend_with_weights'.
All these 18 files are problematic, not discovered so far because I mainly looked at RGB composites
with fill_value=0. I attach 3 examples left is image as expected, right is wrong image with start=1e-6.
Original files produced are *.png, but I reduced them in size and changed them to *.jpg to save space.
Problems with 'blend_with_weights', sum(weights, start=1e-6), generate=False/True does not matter:
'4' Data Pale (IR 10.8, almost white) fill_value0 No_Data region is BLACK (O.K., but caused by 1e-6)
'4' Data Pale (IR 10.8, almost white) fill_value255 No_Data region is BLACK (instead of white)
'4' Data Pale (IR 10.8, almost white) fill_valueNone No_Data region is BLACK (instead of transparent)
'ir108_3d' Data Dark (almost black) fill_value0 No_Data region is WHITE (instead of black, reversed)
'ir108_3d' Data Dark (almost black) fill_value255 No_Data region is WHITE (O.K., but caused by 1e-6)
'ir108_3d' Data Dark (almost black) fill_valueNone No_Data region is WHITE (instead of transparent)
'natural_color' Data look as expected fill_value0 No_Data region is BLACK (O.K., but caused by 1e-6)
'natural_color' Data look as expected fill_value255 No_Data region is BLACK (instead of white)
'natural_color' Data look as expected fill_valueNone No_Data region is BLACK (instead of transparent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Divide by zero with numpy will generally produce NaNs. In Satpy/trollimage these are considered fill values so they either become transparent via the Alpha band or they get set with your fill_value keyword argument. Based on your comment I think this is what you're seeing. For the start value sum cases, yeah I don't see how that would work properly in real world cases. So I'm not concerned. I think I can continue with the rest of my TODO list then.