Skip to content

MOBT-1148 Add subperiod-selector tool#2373

Merged
MoseleyS merged 15 commits into
masterfrom
make_wet_mask
May 15, 2026
Merged

MOBT-1148 Add subperiod-selector tool#2373
MoseleyS merged 15 commits into
masterfrom
make_wet_mask

Conversation

@MoseleyS
Copy link
Copy Markdown
Member

@MoseleyS MoseleyS commented May 6, 2026

Addresses https://github.com/metoppv/mo-blue-team/issues/1148

Builds on #2092. Uses the output "Fraction of period that is wet" data and the probabilities of precipitation in all the subperiods to identify which subperiods should be classed as wet so that the total number of wet periods matches the fraction at each grid point.

Requires test data from

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

@MoseleyS MoseleyS self-assigned this May 6, 2026
@MoseleyS MoseleyS changed the title Add subperiod-selector tool MOBT-1148 Add subperiod-selector tool May 6, 2026
Copy link
Copy Markdown
Contributor

@gavinevans gavinevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MoseleyS 👍

I think that this PR achieves the aim. I've made a few suggestions.

Comment thread improver/categorical/subperiod_selector.py
Comment thread improver/categorical/subperiod_selector.py Outdated
Comment thread improver/categorical/subperiod_selector.py
Comment thread improver/categorical/subperiod_selector.py
Comment thread improver/categorical/subperiod_selector.py Outdated
Comment thread improver/categorical/subperiod_selector.py
Comment thread improver/categorical/subperiod_selector.py Outdated
Comment thread improver/categorical/subperiod_selector.py Outdated
Comment thread improver_tests/acceptance/test_subperiod_selector.py Outdated
gavinevans
gavinevans previously approved these changes May 12, 2026
Copy link
Copy Markdown
Contributor

@mo-jbeaver mo-jbeaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pytest for unit test ran successfully. Acceptance test failed as it couldn't find the threshold_kwargs.json file.
Added a comment to the acceptance test to hopefully fix the filepath to resolve that.
Also added a few other formatting suggestions.

Comment thread improver/categorical/subperiod_selector.py Outdated
Comment thread improver/categorical/subperiod_selector.py Outdated
Comment on lines +117 to +120
ValueError: If no data is found in the main period cube matching the percentile and threshold constraints.
ValueError: If no matching threshold coordinate is found on the subperiod cube.
ValueError: If no data is found in the subperiod cube matching the threshold constraints.
ValueError: If the subperiod cube does not have exactly one more dimension than the main period cube.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ValueError: If no data is found in the main period cube matching the percentile and threshold constraints.
ValueError: If no matching threshold coordinate is found on the subperiod cube.
ValueError: If no data is found in the subperiod cube matching the threshold constraints.
ValueError: If the subperiod cube does not have exactly one more dimension than the main period cube.
ValueError:
- If no data is found in the main period cube matching the percentile and threshold constraints.
- If no matching threshold coordinate is found on the subperiod cube.
- If no data is found in the subperiod cube matching the threshold constraints.
- If the subperiod cube does not have exactly one more dimension than the main period cube.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Comment thread improver/categorical/subperiod_selector.py Outdated
Comment thread improver/cli/subperiod_selector.py Outdated
Comment thread improver_tests/acceptance/test_subperiod_selector.py Outdated
Comment on lines +14 to +19
Plugin to select which subperiods contain the phenomenon identified over the main period.

For example, if the 50th percentile of hours of light rain over a 24 hour period is 0.25 (6 hours),
then this plugin can be used to identify which 6 hours of the 24 hour period are most likely
to contain light rain. The result can be used in the weather symbol decision tree to force
the selection of a wet symbol.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what this script does, because we've discussed this, but I'm not sure I would from the docstring alone. It's a tricky thing to summarise, but perhaps this is a bit clearer (if more verbose)?

"Identifies which parts of a longer time period are most likely to contain a particular weather phenomena.

For example, if light rain is expected for 6 hours within a 24 hour period (e.g. the 50th percentile of light rain over a 24 hour period is 0.25), this plugin selects the 6 hours most likely to contain that light rain.

This output can then be used by the weather symbol decision tree to force the selection of a wet symbol."

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used some of this, and tweaked it a bit further (and copied the result to the CLI doc-string). Thanks for the suggestion.

Comment on lines +47 to +52
"""Identify which subperiods to select based on the selected main period diagnostic slice.

The value at each grid point in the main period data indicates the fraction of subperiods to select, and the values in the
subperiod data indicate the likelihood of the phenomenon occurring in each subperiod.
The subperiods with the highest likelihood are selected until the number of selected subperiods
matches the value from the main period data.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, a tricky one to explain! Perhaps this is a bit clearer?

"Identify the subperiods most likely to contain the phenomenon.

At each grid point, the main period data specifies how many subperiods to select, and the subperiod data gives the likelihood of the phenomenon occurring in each subperiod.

The subperiods with the highest likelihood are selected until the number of selected subperiods matches the value from the main period data.

Where multiple subperiods have equal likelihood, the selection between them is random"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Thanks.

Comment on lines +66 to +70
# Identify which subperiods to select for this period rank. The leading dimension is time, so argpartition
# is used to identify the indices of the subperiods with the highest likelihood.
subperiods_to_select = np.argpartition(
subperiod_data, range(number_of_subperiods), axis=0
)[-period_rank:]
Copy link
Copy Markdown
Contributor

@mo-AliceLake mo-AliceLake May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an explicit tie-breaking step here? np.argpartition does not guarantee random ordering for equal values, so tied subperiods may not be selected randomly (and we wouldn't want to always select the earliest time in the day, etc!).

For example, we could generate a random array tie_breaker with the same shape as subperiod_data and then use subperiods_to_select = np.lexsort((tie_breaker, superiod_data), axis=0)[-period_rank:] instead. This would still rank subperiods by likelihood first, but would use the random values to decide the ordering where likelihoods are equal.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is worth the effort. The probability fields are smooth and linearly varying. Therefore the proportion of ties is going to be vanishingly small which makes handling them well not worthwhile.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection, I agree with this. 🙂 It is also quicker (computationally) which matters more.

Comment thread improver/cli/subperiod_selector.py Outdated
Comment on lines +21 to +26
Select which subperiods contain the phenomenon identified over the main period.

For example, if the 50th percentile of hours of light rain over a 24 hour period is 0.25 (6 hours),
then this plugin can be used to identify which 6 hours of the 24 hour period are most likely
to contain light rain. The result can be used in the weather symbol decision tree to force
the selection of a wet symbol.
Copy link
Copy Markdown
Contributor

@mo-AliceLake mo-AliceLake May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As before (so if you update there, then update here too. 🙂)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

@mo-AliceLake mo-AliceLake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments around docstrings for clarity, take or leave.

Suggestion for enforced randomisation, think we should look at that one.

Copy link
Copy Markdown
Contributor

@mo-jbeaver mo-jbeaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the changes made and the acceptance test now passes.

MoseleyS added 3 commits May 14, 2026 14:15
* master:
  Changed implementation from clipping to masking + added Unit Tests (#2366)
  Adding kwarg as CLI argument & updating acceptance tests. (#2377)
  Refactor Pollen index for daily and hourly to single plugin (#2372)
  EPPT-3259 fix fsi duplicate metadata (#2370)
  Changes to Pollen classes for refactoring cube long names and units of concentration (#2368)
  Eppt 3223 lifted index investigate why the values are wrong (#2365)
  Cast to the original dtype of the points in `expand_bounds`  (#2367)
  Changes that might help with intermittent Stochastic Noise failure (#2346)
* master:
  change ApplyDecisionTree categorical cube dtype from int32 to int16 (#2371)
* master:
  Revert "change ApplyDecisionTree categorical cube dtype from int32 to int16 (…" (#2382)
@MoseleyS MoseleyS merged commit 68d3b70 into master May 15, 2026
7 checks passed
@MoseleyS MoseleyS deleted the make_wet_mask branch May 15, 2026 07:03
MoseleyS added a commit that referenced this pull request May 15, 2026
* master:
  MOBT-1148 Add subperiod-selector tool (#2373)
  Revert "change ApplyDecisionTree categorical cube dtype from int32 to int16 (…" (#2382)
  change ApplyDecisionTree categorical cube dtype from int32 to int16 (#2371)
  Changed implementation from clipping to masking + added Unit Tests (#2366)
  Adding kwarg as CLI argument & updating acceptance tests. (#2377)
  Refactor Pollen index for daily and hourly to single plugin (#2372)
  EPPT-3259 fix fsi duplicate metadata (#2370)
  Changes to Pollen classes for refactoring cube long names and units of concentration (#2368)
  Eppt 3223 lifted index investigate why the values are wrong (#2365)
  Cast to the original dtype of the points in `expand_bounds`  (#2367)
  Changes that might help with intermittent Stochastic Noise failure (#2346)

# Conflicts:
#	improver/categorical/subperiod_selector.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants