MOBT-1148 Add subperiod-selector tool by MoseleyS · Pull Request #2373 · metoppv/improver

MoseleyS · 2026-05-06T12:57:22Z

Addresses https://github.com/metoppv/mo-blue-team/issues/1148

Builds on #2092. Uses the output "Fraction of period that is wet" data and the probabilities of precipitation in all the subperiods to identify which subperiods should be classed as wet so that the total number of wet periods matches the fraction at each grid point.

Requires test data from

MOBT-1148 Adds inputs and KGO for subperiod-selector acceptance test improver_test_data#131

Testing:

Ran tests and they passed OK
Added new tests for the new feature(s)

…lities for a phenomenon are present.

…rom it.

gavinevans

Thanks @MoseleyS 👍

I think that this PR achieves the aim. I've made a few suggestions.

mo-jbeaver

Pytest for unit test ran successfully. Acceptance test failed as it couldn't find the threshold_kwargs.json file.
Added a comment to the acceptance test to hopefully fix the filepath to resolve that.
Also added a few other formatting suggestions.

mo-jbeaver · 2026-05-12T13:03:53Z

+            ValueError: If no data is found in the main period cube matching the percentile and threshold constraints.
+            ValueError: If no matching threshold coordinate is found on the subperiod cube.
+            ValueError: If no data is found in the subperiod cube matching the threshold constraints.
+            ValueError: If the subperiod cube does not have exactly one more dimension than the main period cube.


Suggested change

ValueError: If no data is found in the main period cube matching the percentile and threshold constraints.

ValueError: If no matching threshold coordinate is found on the subperiod cube.

ValueError: If no data is found in the subperiod cube matching the threshold constraints.

ValueError: If the subperiod cube does not have exactly one more dimension than the main period cube.

ValueError:

- If no data is found in the main period cube matching the percentile and threshold constraints.

- If no matching threshold coordinate is found on the subperiod cube.

- If no data is found in the subperiod cube matching the threshold constraints.

- If the subperiod cube does not have exactly one more dimension than the main period cube.

mo-AliceLake · 2026-05-12T15:10:25Z

+    Plugin to select which subperiods contain the phenomenon identified over the main period.
+
+    For example, if the 50th percentile of hours of light rain over a 24 hour period is 0.25 (6 hours),
+    then this plugin can be used to identify which 6 hours of the 24 hour period are most likely
+    to contain light rain. The result can be used in the weather symbol decision tree to force
+    the selection of a wet symbol.


I understand what this script does, because we've discussed this, but I'm not sure I would from the docstring alone. It's a tricky thing to summarise, but perhaps this is a bit clearer (if more verbose)?

"Identifies which parts of a longer time period are most likely to contain a particular weather phenomena.

For example, if light rain is expected for 6 hours within a 24 hour period (e.g. the 50th percentile of light rain over a 24 hour period is 0.25), this plugin selects the 6 hours most likely to contain that light rain.

This output can then be used by the weather symbol decision tree to force the selection of a wet symbol."

I have used some of this, and tweaked it a bit further (and copied the result to the CLI doc-string). Thanks for the suggestion.

mo-AliceLake · 2026-05-12T15:16:18Z

+        """Identify which subperiods to select based on the selected main period diagnostic slice.
+
+        The value at each grid point in the main period data indicates the fraction of subperiods to select, and the values in the
+        subperiod data indicate the likelihood of the phenomenon occurring in each subperiod.
+        The subperiods with the highest likelihood are selected until the number of selected subperiods
+        matches the value from the main period data.


Similarly, a tricky one to explain! Perhaps this is a bit clearer?

"Identify the subperiods most likely to contain the phenomenon.

At each grid point, the main period data specifies how many subperiods to select, and the subperiod data gives the likelihood of the phenomenon occurring in each subperiod.

The subperiods with the highest likelihood are selected until the number of selected subperiods matches the value from the main period data.

Where multiple subperiods have equal likelihood, the selection between them is random"

Changed. Thanks.

mo-AliceLake · 2026-05-12T15:29:26Z

+            # Identify which subperiods to select for this period rank. The leading dimension is time, so argpartition
+            # is used to identify the indices of the subperiods with the highest likelihood.
+            subperiods_to_select = np.argpartition(
+                subperiod_data, range(number_of_subperiods), axis=0
+            )[-period_rank:]


Should we add an explicit tie-breaking step here? np.argpartition does not guarantee random ordering for equal values, so tied subperiods may not be selected randomly (and we wouldn't want to always select the earliest time in the day, etc!).

For example, we could generate a random array tie_breaker with the same shape as subperiod_data and then use subperiods_to_select = np.lexsort((tie_breaker, superiod_data), axis=0)[-period_rank:] instead. This would still rank subperiods by likelihood first, but would use the random values to decide the ordering where likelihoods are equal.

I don't think this is worth the effort. The probability fields are smooth and linearly varying. Therefore the proportion of ties is going to be vanishingly small which makes handling them well not worthwhile.

On reflection, I agree with this. 🙂 It is also quicker (computationally) which matters more.

mo-AliceLake · 2026-05-12T15:31:41Z

+    Select which subperiods contain the phenomenon identified over the main period.
+
+    For example, if the 50th percentile of hours of light rain over a 24 hour period is 0.25 (6 hours),
+    then this plugin can be used to identify which 6 hours of the 24 hour period are most likely
+    to contain light rain. The result can be used in the weather symbol decision tree to force
+    the selection of a wet symbol.


As before (so if you update there, then update here too. 🙂)

mo-AliceLake

Minor comments around docstrings for clarity, take or leave.

Suggestion for enforced randomisation, think we should look at that one.

…ced to acceptance test.

mo-jbeaver

Happy with the changes made and the acceptance test now passes.

* master: Changed implementation from clipping to masking + added Unit Tests (#2366) Adding kwarg as CLI argument & updating acceptance tests. (#2377) Refactor Pollen index for daily and hourly to single plugin (#2372) EPPT-3259 fix fsi duplicate metadata (#2370) Changes to Pollen classes for refactoring cube long names and units of concentration (#2368) Eppt 3223 lifted index investigate why the values are wrong (#2365) Cast to the original dtype of the points in `expand_bounds` (#2367) Changes that might help with intermittent Stochastic Noise failure (#2346)

* master: change ApplyDecisionTree categorical cube dtype from int32 to int16 (#2371)

* master: Revert "change ApplyDecisionTree categorical cube dtype from int32 to int16 (…" (#2382)

* master: MOBT-1148 Add subperiod-selector tool (#2373) Revert "change ApplyDecisionTree categorical cube dtype from int32 to int16 (…" (#2382) change ApplyDecisionTree categorical cube dtype from int32 to int16 (#2371) Changed implementation from clipping to masking + added Unit Tests (#2366) Adding kwarg as CLI argument & updating acceptance tests. (#2377) Refactor Pollen index for daily and hourly to single plugin (#2372) EPPT-3259 fix fsi duplicate metadata (#2370) Changes to Pollen classes for refactoring cube long names and units of concentration (#2368) Eppt 3223 lifted index investigate why the values are wrong (#2365) Cast to the original dtype of the points in `expand_bounds` (#2367) Changes that might help with intermittent Stochastic Noise failure (#2346) # Conflicts: # improver/categorical/subperiod_selector.py

MoseleyS added 9 commits May 1, 2026 15:46

Adds plugin to create a masked cube showing where the highest probabi…

7d1ae5e

…lities for a phenomenon are present.

Adds SubperiodSelector plugin to API

11f220c

Adds SubperiodSelector plugin to CLI

591b394

Adds option to set the name of the output cube

2d67895

Adds acceptance test for subperiod-selector, and fixes bugs arising f…

d6823eb

…rom it.

Updates unit test data type

b719dff

Fixes an indexing bug in subperiod_selector.py

897e2fd

Adds checksums for new acceptance test data

019202c

Adds ValueErrors to doc-string

3f8791d

MoseleyS self-assigned this May 6, 2026

MoseleyS mentioned this pull request May 6, 2026

MOBT-1148 Adds inputs and KGO for subperiod-selector acceptance test metoppv/improver_test_data#131

Merged

MoseleyS changed the title ~~Add subperiod-selector tool~~ MOBT-1148 Add subperiod-selector tool May 6, 2026

gavinevans requested changes May 12, 2026

View reviewed changes

Review response

19ec7ef

gavinevans previously approved these changes May 12, 2026

View reviewed changes

mo-jbeaver requested changes May 12, 2026

View reviewed changes

mo-AliceLake reviewed May 12, 2026

View reviewed changes

Improves doc-strings following 2nd and 3rd reviews. Fixes bug introdu…

1b46fc1

…ced to acceptance test.

MoseleyS dismissed gavinevans’s stale review via 1b46fc1 May 14, 2026 08:20

Another doc-string tweak

59c2583

mo-jbeaver approved these changes May 14, 2026

View reviewed changes

MoseleyS mentioned this pull request May 14, 2026

Add subperiod-selector tool improver to release branch #2381

Open

2 tasks

MoseleyS added 3 commits May 14, 2026 14:15

Merge branch 'master' into make_wet_mask

246221f

* master: change ApplyDecisionTree categorical cube dtype from int32 to int16 (#2371)

Merge branch 'master' into make_wet_mask

73a2dee

* master: Revert "change ApplyDecisionTree categorical cube dtype from int32 to int16 (…" (#2382)

MoseleyS merged commit 68d3b70 into master May 15, 2026
7 checks passed

MoseleyS deleted the make_wet_mask branch May 15, 2026 07:03

Conversation

MoseleyS commented May 6, 2026

Uh oh!

gavinevans left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mo-jbeaver left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mo-AliceLake May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mo-AliceLake May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mo-AliceLake left a comment

Choose a reason for hiding this comment

Uh oh!

mo-jbeaver left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mo-AliceLake May 12, 2026 •

edited

Loading

mo-AliceLake May 12, 2026 •

edited

Loading