Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiscene blend with weights #2275

Merged
merged 34 commits into from Jan 30, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
1ec54cf
Extend blend method and add a new blend function using weights
Nov 12, 2022
7eba546
refactored new function weighted function and accompanying unittests …
Nov 14, 2022
a7d5ee7
Fixing the stacked blend functions taking care of fillevalue and impr…
Nov 16, 2022
d4bec06
Improve unittest code: Use fixture with autouse to avoid hardcoding c…
Nov 17, 2022
6a3b52e
Change standardname for cloudtype to ct
Dec 21, 2022
d20d4de
Merge branch 'main' into multiscene_blend_with_weights
Jan 13, 2023
51df85e
Refactor and maintain one stack function available publicly, and make…
Jan 23, 2023
d32eeeb
Use setup method rather than autouse fixture
Jan 23, 2023
4091526
Merge branch 'main' into multiscene_blend_with_weights
Jan 23, 2023
1abfc5a
Re-arrange multiscene tests into three modules in a sub-directory
Jan 23, 2023
b28df3f
Update satpy/modifiers/angles.py
adybbroe Jan 24, 2023
e262e12
Update satpy/multiscene.py
adybbroe Jan 24, 2023
24b9153
Improve doc string in satpy/multiscene.py
adybbroe Jan 24, 2023
695b98d
Clarify doc string in satpy/multiscene.py
adybbroe Jan 24, 2023
6627d48
Update satpy/tests/multiscene_tests/test_save_animation.py
adybbroe Jan 24, 2023
80ff2a0
Make sure module headers are consistent
adybbroe Jan 24, 2023
818b0f1
Update satpy/tests/multiscene_tests/test_blend.py
adybbroe Jan 24, 2023
c2a90d8
Update copyright in module header
adybbroe Jan 24, 2023
248d449
Update satpy/multiscene.py
adybbroe Jan 24, 2023
f4b0874
Use attrs keys to access the fill value
adybbroe Jan 24, 2023
0def84a
Remove dupplicate code, and use make_dataid from utils instead
Jan 24, 2023
3b718ab
Merge branch 'multiscene_blend_with_weights' of github.com:adybbroe/s…
Jan 24, 2023
505aa24
Moving test utilities from __init__ to separate module
Jan 24, 2023
2d38257
Merge branch 'main' into multiscene_blend_with_weights
Jan 25, 2023
b5adf6a
Minor refactoring of the blend/stack functions and add combining of t…
Jan 26, 2023
5e91383
Add a bit of documenation on using weights when blending with the sta…
Jan 26, 2023
897df3d
Improve formulation on the motive of blending geo and polar scenes
adybbroe Jan 27, 2023
0b187dd
Fix typo
adybbroe Jan 27, 2023
c987b5d
Refactor function using one for loop for efficiency and make private
Jan 27, 2023
933de7b
Fix code example for blending with weihts - getting the area-id corre…
Jan 27, 2023
3c125dc
Only try combine start and end times if they are already there
Jan 27, 2023
238c71c
Revert back on standard_name for cloudtype dataset
Jan 27, 2023
c76e790
Be more explicit in the doc pages when storing the blended cloudtype …
Jan 27, 2023
8baca79
Fix minor editorial in RTDs
Jan 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
24 changes: 17 additions & 7 deletions satpy/multiscene.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright (c) 2016-2019 Satpy developers
# Copyright (c) 2016-2019, 2022 Satpy developers
#
# This file is part of satpy.
#
Expand Down Expand Up @@ -45,6 +45,16 @@
log = logging.getLogger(__name__)


def weighted(datasets, weights=None):
adybbroe marked this conversation as resolved.
Show resolved Hide resolved
"""Blend datasets using weights."""
indices = da.argmax(da.dstack(weights), axis=-1)
dims = datasets[0].dims
attrs = datasets[0].attrs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have discussed this already, but first or last dataset? Or should it be a merge of the two? That way start_time and end_time should be handled "properly".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think we briefly discussed it. It is not obvious to me what you should do to make it intuitive and right. Which is why I would prefer to keep it as simple as possible. It feels like it will depend on the use case. It maybe that some scenes are not contributing at all to the final product, though they have been part of the stacking/blending process. Sounds like we should perhaps make a function that does this blending of attributes, and that way at least it will be more explicit in the code for future adaptations/changes? Then I would suggest we keep it simple what that function then actually does for now. I would go for a special treatment of the scene start and end times, and take from either first or last for the rest of the attributes? What else, besides times need special care?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def combine_metadata(*metadata_objects, average_times=True):
"""Combine the metadata of two or more Datasets.
If the values corresponding to any keys are not equal or do not
exist in all provided dictionaries then they are not included in
the returned dictionary. By default any keys with the word 'time'
in them and consisting of datetime objects will be averaged. This
is to handle cases where data were observed at almost the same time
but not exactly. In the interest of time, lazy arrays are compared by
object identity rather than by their contents.
Args:
*metadata_objects: MetadataObject or dict objects to combine
average_times (bool): Average any keys with 'time' in the name
Returns:
dict: the combined metadata
"""

The function already exists ^. But it looks like it doesn't handle start/end time the way I thought it would. That special handling is apparently only in the internal reader code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Now I have changed the tests so that the blended scene has the start and end times set to the oldest start_time and the newest end_time of the scenes in the weighted stack. And the test pass with this code in the stack function:

    attrs = combine_metadata(*[x.attrs for x in datasets])

    start_time = datetime.now()
    for s_time in [x.attrs['start_time'] for x in datasets if 'start_time' in x.attrs]:
        if s_time < start_time:
            start_time = s_time

    end_time = datetime.fromtimestamp(0)
    for e_time in [x.attrs['end_time'] for x in datasets if 'end_time' in x.attrs]:
        if e_time > end_time:
            end_time = e_time

    attrs['start_time'] = start_time
    attrs['end_time'] = end_time

I should of course put that time calculation in a separate function. But I suppose you would like to have a combine_metadata function that also handle the times as expected?
Here I am not averaging but looking for the time span that describes all the data resulting in the blended scene.

Also, do you think I should handle the case when a scene does not have a start and an end time, and how?
I would tend to require in that case that the user provides scenes that have start/end times! Or should all scenes have that (requirement of all readers)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djhoese I feel like this is growing, so hope I can finish it soon and then we could add more complexity and more clever handling of attributes etc in other/future PRs when that becomes needed, does that sound reasonable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK keeping this as the first Scene metadata for now. If you have the time I think combine_metadata would at least be more accurate. If you also have the time a future PR to combine_metadata to min/max start/end time would be appreciated. I know the conversation has come up in the past so hopefully I'm not forgetting some reason for why it isn't already that way.

weighted_array = xr.DataArray(da.choose(indices, datasets),
dims=dims, attrs=attrs)
return weighted_array


def stack(datasets):
"""Overlay series of datasets on top of each other."""
base = datasets[0].copy()
Expand Down Expand Up @@ -339,7 +349,7 @@ def resample(self, destination=None, **kwargs):
"""Resample the multiscene."""
return self._generate_scene_func(self._scenes, 'resample', True, destination=destination, **kwargs)

def blend(self, blend_function=stack):
def blend(self, blend_function=stack, **kwargs):
"""Blend the datasets into one scene.

Reduce the :class:`MultiScene` to a single :class:`~satpy.scene.Scene`. Datasets
Expand All @@ -364,7 +374,7 @@ def blend(self, blend_function=stack):
common_datasets = self.shared_dataset_ids
for ds_id in common_datasets:
datasets = [scn[ds_id] for scn in self.scenes if ds_id in scn]
new_scn[ds_id] = blend_function(datasets)
new_scn[ds_id] = blend_function(datasets, **kwargs)
adybbroe marked this conversation as resolved.
Show resolved Hide resolved

return new_scn

Expand Down Expand Up @@ -508,7 +518,7 @@ def _get_single_frame(self, ds, enh_args, fill_value):
enh_args = enh_args.copy() # don't change caller's dict!
if "decorate" in enh_args:
enh_args["decorate"] = self._format_decoration(
ds, enh_args["decorate"])
ds, enh_args["decorate"])
img = get_enhanced_image(ds, **enh_args)
data, mode = img.finalize(fill_value=fill_value)
if data.ndim == 3:
Expand Down Expand Up @@ -632,7 +642,7 @@ def _get_writers_and_frames(
info_datasets = [scn.get(dataset_id) for scn in info_scenes]
this_fn, shape, this_fill = self._get_animation_info(info_datasets, filename, fill_value=fill_value)
data_to_write = self._get_animation_frames(
all_datasets, shape, this_fill, ignore_missing, enh_args)
all_datasets, shape, this_fill, ignore_missing, enh_args)

writer = imageio.get_writer(this_fn, **imio_args)
frames[dataset_id] = data_to_write
Expand Down Expand Up @@ -703,8 +713,8 @@ def save_animation(self, filename, datasets=None, fps=10, fill_value=None,
raise ImportError("Missing required 'imageio' library")

(writers, frames) = self._get_writers_and_frames(
filename, datasets, fill_value, ignore_missing,
enh_args, imio_args={"fps": fps, **kwargs})
filename, datasets, fill_value, ignore_missing,
enh_args, imio_args={"fps": fps, **kwargs})

client = self._get_client(client=client)
# get an ordered list of frames
Expand Down
136 changes: 110 additions & 26 deletions satpy/tests/test_multiscene.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright (c) 2018 Satpy developers
# Copyright (c) 2018, 2022 Satpy developers
#
# This file is part of satpy.
#
Expand All @@ -24,8 +24,10 @@
from datetime import datetime
from unittest import mock

import dask.array as da
import pytest
import xarray as xr
from pyresample.geometry import AreaDefinition

from satpy import DataQuery
from satpy.dataset.dataid import DataID, ModifierTuple, WavelengthRange
Expand Down Expand Up @@ -166,7 +168,7 @@ def test_from_files(self):
"OR_ABI-L1b-RadC-M3C01_G16_s20171171517203_e20171171519577_c20171171520019.nc",
"OR_ABI-L1b-RadC-M3C01_G16_s20171171522203_e20171171524576_c20171171525020.nc",
"OR_ABI-L1b-RadC-M3C01_G16_s20171171527203_e20171171529576_c20171171530017.nc",
]
]
input_files_glm = [
"OR_GLM-L2-GLMC-M3_G16_s20171171500000_e20171171501000_c20380190314080.nc",
"OR_GLM-L2-GLMC-M3_G16_s20171171501000_e20171171502000_c20380190314080.nc",
Expand All @@ -179,9 +181,9 @@ def test_from_files(self):
]
with mock.patch('satpy.multiscene.Scene') as scn_mock:
mscn = MultiScene.from_files(
input_files_abi,
reader='abi_l1b',
scene_kwargs={"reader_kwargs": {}})
input_files_abi,
reader='abi_l1b',
scene_kwargs={"reader_kwargs": {}})
assert len(mscn.scenes) == 6
calls = [mock.call(
filenames={'abi_l1b': [in_file_abi]},
Expand All @@ -192,11 +194,11 @@ def test_from_files(self):
scn_mock.reset_mock()
with pytest.warns(DeprecationWarning):
mscn = MultiScene.from_files(
input_files_abi + input_files_glm,
reader=('abi_l1b', "glm_l2"),
group_keys=["start_time"],
ensure_all_readers=True,
time_threshold=30)
input_files_abi + input_files_glm,
reader=('abi_l1b', "glm_l2"),
group_keys=["start_time"],
ensure_all_readers=True,
time_threshold=30)
assert len(mscn.scenes) == 2
calls = [mock.call(
filenames={'abi_l1b': [in_file_abi], 'glm_l2': [in_file_glm]})
Expand All @@ -206,11 +208,11 @@ def test_from_files(self):
scn_mock.assert_has_calls(calls)
scn_mock.reset_mock()
mscn = MultiScene.from_files(
input_files_abi + input_files_glm,
reader=('abi_l1b', "glm_l2"),
group_keys=["start_time"],
ensure_all_readers=False,
time_threshold=30)
input_files_abi + input_files_glm,
reader=('abi_l1b', "glm_l2"),
group_keys=["start_time"],
ensure_all_readers=False,
time_threshold=30)
assert len(mscn.scenes) == 12


Expand Down Expand Up @@ -531,7 +533,7 @@ def test_crop(self):
x_size // 2,
y_size // 2,
area_extent,
)
)
scene1["1"] = DataArray(np.zeros((y_size, x_size)))
scene1["2"] = DataArray(np.zeros((y_size, x_size)), dims=('y', 'x'))
scene1["3"] = DataArray(np.zeros((y_size, x_size)), dims=('y', 'x'),
Expand Down Expand Up @@ -562,19 +564,29 @@ def setUp(self):
import dask.array as da
import xarray as xr
from pyresample.geometry import AreaDefinition
shape = (8, 12)
area = AreaDefinition('test', 'test', 'test',
{'proj': 'geos', 'lon_0': -95.5, 'h': 35786023.0},
2, 2, [-200, -200, 200, 200])
ds1 = xr.DataArray(da.zeros((2, 2), chunks=-1), dims=('y', 'x'),
shape[1], shape[0], [-200, -200, 200, 200])
ds1 = xr.DataArray(da.ones(shape, chunks=-1), dims=('y', 'x'),
attrs={'start_time': datetime(2018, 1, 1, 0, 0, 0), 'area': area})
self.ds1 = ds1
ds2 = xr.DataArray(da.zeros((2, 2), chunks=-1), dims=('y', 'x'),
wgt1 = xr.DataArray(da.ones(shape, chunks=-1), dims=('y', 'x'),
attrs={'start_time': datetime(2018, 1, 1, 0, 0, 0), 'area': area})
self.ds1_wgt = wgt1
ds2 = xr.DataArray(da.ones(shape, chunks=-1) * 2, dims=('y', 'x'),
attrs={'start_time': datetime(2018, 1, 1, 1, 0, 0), 'area': area})
self.ds2 = ds2
ds3 = xr.DataArray(da.zeros((2, 2), chunks=-1), dims=('y', 'time'),
wgt2 = xr.DataArray(da.zeros(shape, chunks=-1), dims=('y', 'x'),
attrs={'start_time': datetime(2018, 1, 1, 0, 0, 0), 'area': area})
self.line = 2
wgt2[self.line, :] = 2
self.ds2_wgt = wgt2

ds3 = xr.DataArray(da.zeros(shape, chunks=-1), dims=('y', 'time'),
attrs={'start_time': datetime(2018, 1, 1, 0, 0, 0), 'area': area})
self.ds3 = ds3
ds4 = xr.DataArray(da.zeros((2, 2), chunks=-1), dims=('y', 'time'),
ds4 = xr.DataArray(da.zeros(shape, chunks=-1), dims=('y', 'time'),
attrs={'start_time': datetime(2018, 1, 1, 1, 0, 0), 'area': area})
self.ds4 = ds4

Expand All @@ -597,6 +609,78 @@ def test_timeseries(self):
self.assertTupleEqual((self.ds3.shape[0], self.ds3.shape[1]+self.ds4.shape[1]), res2.shape)


@pytest.fixture
def datasets_and_weights():
"""X-Array datasets with area definition plus weights for input to tests."""
shape = (8, 12)
area = AreaDefinition('test', 'test', 'test',
{'proj': 'geos', 'lon_0': -95.5, 'h': 35786023.0},
shape[1], shape[0], [-200, -200, 200, 200])
ds1 = xr.DataArray(da.ones(shape, chunks=-1), dims=('y', 'x'),
attrs={'start_time': datetime(2018, 1, 1, 0, 0, 0), 'area': area})
ds2 = xr.DataArray(da.ones(shape, chunks=-1) * 2, dims=('y', 'x'),
attrs={'start_time': datetime(2018, 1, 1, 1, 0, 0), 'area': area})
ds3 = xr.DataArray(da.ones(shape, chunks=-1) * 3, dims=('y', 'x'),
attrs={'start_time': datetime(2018, 1, 1, 1, 0, 0), 'area': area})

wgt1 = xr.DataArray(da.ones(shape, chunks=-1), dims=('y', 'x'),
attrs={'start_time': datetime(2018, 1, 1, 0, 0, 0), 'area': area})
wgt2 = xr.DataArray(da.zeros(shape, chunks=-1), dims=('y', 'x'),
attrs={'start_time': datetime(2018, 1, 1, 0, 0, 0), 'area': area})
wgt3 = xr.DataArray(da.zeros(shape, chunks=-1), dims=('y', 'x'),
attrs={'start_time': datetime(2018, 1, 1, 0, 0, 0), 'area': area})

datastruct = {'shape': shape,
'area': area,
'datasets': [ds1, ds2, ds3],
'weights': [wgt1, wgt2, wgt3]}
return datastruct


@pytest.mark.parametrize(('line', 'column',),
[(2, 3), (4, 5)]
)
def test_blend_function_weighted(datasets_and_weights, line, column):
"""Test the 'weighted' function."""
from satpy.multiscene import weighted

input_data = datasets_and_weights

input_data['weights'][1][line, :] = 2
input_data['weights'][2][:, column] = 2

blend_result = weighted(input_data['datasets'], input_data['weights'])

ds1 = input_data['datasets'][0]
ds2 = input_data['datasets'][1]
ds3 = input_data['datasets'][2]
expected = ds1.copy()
expected[:, column] = ds3[:, column]
expected[line, :] = ds2[line, :]

xr.testing.assert_equal(blend_result.compute(), expected.compute())
assert expected.attrs == blend_result.attrs


def test_blend_function_stack(datasets_and_weights):
"""Test the 'stack' function."""
from satpy.multiscene import stack

input_data = datasets_and_weights

ds1 = input_data['datasets'][0]
ds2 = input_data['datasets'][1]

res = stack([ds1, ds2])
expected = ds2.copy()

xr.testing.assert_equal(res.compute(), expected.compute())
# assert expected.attrs == res.attrs
# FIXME! Looks like the attributes are taken from the first dataset. Should
# be like that? So in this case the datetime is different from "expected"
# (= in this case the last dataset in the stack, the one on top)


@mock.patch('satpy.multiscene.get_enhanced_image')
def test_save_mp4(smg, tmp_path):
"""Save a series of fake scenes to an mp4 video."""
Expand Down Expand Up @@ -656,11 +740,11 @@ def test_save_mp4(smg, tmp_path):
with mock.patch('satpy.multiscene.imageio.get_writer') as get_writer:
get_writer.return_value = writer_mock
mscn.save_animation(
fn, client=False,
enh_args={"decorate": {
"decorate": [{
"text": {
"txt":
fn, client=False,
enh_args={"decorate": {
"decorate": [{
"text": {
"txt":
"Test {start_time:%Y-%m-%d %H:%M} - "
"{end_time:%Y-%m-%d %H:%M}"}}]}})
assert writer_mock.append_data.call_count == 2 + 2
Expand Down