Skip to content

Conversation

@RubenImhoff
Copy link
Contributor

The current reduced-space ensemble kalman filter blending is working well in our test setup at KNMI. However, we notice some edge artefacts sometimes in the forecast, especially for the first lead times where the nowcast is dominant, e.g. (I believe this was the probability of >0.1mm/h rainfall for all members):

image

What we think happens is that the transition nowcast - NWP at the edge of the radar domain can be sharp and because the advection step is now all the way at the end of the forecast (probability matching and the Kalman filter correction are before it), you also get discontinuities in the nowcast rainfall if a field is right at the edge of the domain. The smooth_radar_mask_range option that we have already implemented in the STEPS blending procedure might be a solution. This PR implements this option in the pca_kalman_filter_blending code.

@m-rempel, I couldn't add you as a reviewer, but maybe good if you have a look at it as well. :)

@RubenImhoff RubenImhoff self-assigned this Nov 18, 2025
@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 95.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.16%. Comparing base (fe3a98a) to head (07725ca).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
pysteps/blending/pca_ens_kalman_filter.py 95.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #528      +/-   ##
==========================================
+ Coverage   84.14%   84.16%   +0.02%     
==========================================
  Files         168      168              
  Lines       14507    14540      +33     
==========================================
+ Hits        12207    12238      +31     
- Misses       2300     2302       +2     
Flag Coverage Δ
unit_tests 84.16% <95.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

# forecast outside the radar domain. Therefore, fill these
# areas with NWP, if requested.
if self.__forecast_state.config.smooth_radar_mask_range != 0:
nan_indices = np.isnan(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this nan_mask in stead of nan_indices, as this contains a boolean mask array.

mats-knmi
mats-knmi previously approved these changes Nov 19, 2025
@mats-knmi mats-knmi dismissed their stale review November 19, 2025 15:08

It doesn't work yet

@m-rempel
Copy link
Contributor

Hi @RubenImhoff !

Thank you for that nice additional work! Please keep in mind, that I've followed the strategy to store all the nan values in the variable no_data_mask and replaced these nan values within the forecast loop with thr_min. The idea behind was that the Kalman filter does all the weighting---even beyond the radar domain. So, please test whether there are actually nan values within your defined mask.

Regarding your example, actually, I've not tested the case where the domain edge is equal or smaller than the radar coverage, because in my cases thr_min values are replicated at the edges instead of precipitation values.

I will have a closer look on that tomorrow.

@dnerini dnerini requested a review from m-rempel November 19, 2025 19:49
@RubenImhoff
Copy link
Contributor Author

Hi @m-rempel, thanks for the feedback! Yes, we also saw that the majority of the problem originates from the fact that with the extrapolation_kwargs["map_coordinates_mode"] = "nearest" method, the rainfall at the edges of the domain is copied and kept during the advection steps. This results in the error you see above.

For now, the way to fix it is, is to use extrapolation_kwargs["map_coordinates_mode"] = "constant", but that can give nans at the edges if extrap_kwargs["interp_order"] = 3 (as in your example), or hard transitions with a fixed value. What was your reason to go for extrap_kwargs["interp_order"] = 3"?

Our solution for now is to try some smoothing at the edges, but that may not be necessary anymore if we find a better solution to fill the transition at the edges well. A fixed value would do or perhaps a smart resampling. What do you think?

@m-rempel
Copy link
Contributor

Hi @RubenImhoff , I have to interpolate the NWC fields at each forecast time step to do the combination since I use the intrinsic motion of the NWP precipitation fields. Therefore my intention for the bicubic interpolation was to reduce interpolation artefacts within quasi-stationary situations.

May be one could think about an improved way to do the position allocation of each precipitation value for the PCA to overcome that forecast step-wise interpolation. However, I don't have an idea to do that in a sufficient amount of time.

RubenImhoff and others added 2 commits November 20, 2025 11:08
Co-authored-by: mats-knmi <145579783+mats-knmi@users.noreply.github.com>
Co-authored-by: mats-knmi <145579783+mats-knmi@users.noreply.github.com>
@RubenImhoff
Copy link
Contributor Author

Hi @RubenImhoff , I have to interpolate the NWC fields at each forecast time step to do the combination since I use the intrinsic motion of the NWP precipitation fields. Therefore my intention for the bicubic interpolation was to reduce interpolation artefacts within quasi-stationary situations.

May be one could think about an improved way to do the position allocation of each precipitation value for the PCA to overcome that forecast step-wise interpolation. However, I don't have an idea to do that in a sufficient amount of time.

Thanks, @m-rempel! Then we continue with trying to make work this minor improvement at the edges and if we can think of an even better idea to fix this, we can implement that later.

@m-rempel
Copy link
Contributor

Hi @RubenImhoff and @mats-knmi , can you give my latest changes a try in your environment? The smooth mask should work now for nwc_prediction_btf.

@RubenImhoff
Copy link
Contributor Author

Thanks, @m-rempel! @mats-knmi is back in the office on Monday, so we will have to wait for his test case for a little longer.
In the meantime, we saw that the blending moves faster to NWP with the current setup, possibly because we are throwing in NWP at the nans at the edges. I was thinking, would it make sense to change self.__forecast_state.nwc_prediction[self.__ens_member][nan_mask] = nwp[nan_mask] to a constant and to fill it with NWP once we are in the 'if' statement with smooth_radar_mask_range?

@m-rempel
Copy link
Contributor

Yep @RubenImhoff , that would be a better strategy. I'll change this today in the evening.

@m-rempel
Copy link
Contributor

It's a little bit more complex than I thought, but it works now for timesteps > 2. The actual observation doesn't contain the NWP data and timesteps 1 and 2 have the same data due to the pure advection step without any correction at the beginning.

@RubenImhoff
Copy link
Contributor Author

Really nice work, @m-rempel! We will try it out early next week. :)
Let's see if that does the trick.

We also saw that, in our test case for the Netherlands, the ensemble spread of the blended foreccast is much smaller than of the ensemble nowcasts for the same case (for the first hour or so). Is that something you are familiar with? If it is fixable with a parameter setting, we will try that out. If not, we will make an issues.

@m-rempel
Copy link
Contributor

Yep, I saw that also @RubenImhoff. There are at least two reasons for that: First, the spread of the analysis ensemble is in general smaller than the spread of background and observation ensemble. Second, spread generation due to STEPS is suppressed by areas without precipitation in NWP forecast as well as by the probability matching at each timestep.

For the latter, I have some improvements.

@RubenImhoff
Copy link
Contributor Author

Good to know, @m-rempel!

@mats-knmi
Copy link
Contributor

Yep, I saw that also @RubenImhoff. There are at least two reasons for that: First, the spread of the analysis ensemble is in general smaller than the spread of background and observation ensemble. Second, spread generation due to STEPS is suppressed by areas without precipitation in NWP forecast as well as by the probability matching at each timestep.

For the latter, I have some improvements.

Ah this is nice to hear. I was looking into a specific case for us (one where we had almost 0 spread in the nowcast untill 2-3 hours lead time when NWP fully took over) and it seems like in that case the second issue you mentioned was the biggest contributor. So if you can improve on that that would be great.

@RubenImhoff
Copy link
Contributor Author

RubenImhoff commented Nov 26, 2025

@m-rempel, @mats-knmi and I added the functionality to also advect the domain mask. When doing so in combination with your changes, everything works as expected. I have, for now, changed the advection of the domain mask to being conditional on the smooth_radar_mask_range being larger than zero. However, when we were discussing about it, it seemed odd to us to keep overlaying a static radar domain mask in a blended product, especially when the radar data is advecting out of that domain, while there is NWP data to fill it up. What are your thoughts on this? Other than that, we are good to go, if all tests pass. :)

@m-rempel
Copy link
Contributor

Hi @RubenImhoff and @mats-knmi, my idea for that static radar domain mask arose from the utilization of radar products in observation space such as reflectivity. For the latter, the simulated counterpart is also only computed within the observational domain. Therefore, in my opinion, it is more user-friendly to keep the domain constant.

For retrieved products in model space, and especially in the case of filling up with NWP data, it makes more sense to advect the mask. Thank you to bring up this question!

@mats-knmi
Copy link
Contributor

Hi @RubenImhoff and @mats-knmi, my idea for that static radar domain mask arose from the utilization of radar products in observation space such as reflectivity. For the latter, the simulated counterpart is also only computed within the observational domain. Therefore, in my opinion, it is more user-friendly to keep the domain constant.

For retrieved products in model space, and especially in the case of filling up with NWP data, it makes more sense to advect the mask. Thank you to bring up this question!

While I do agree that keeping the mask constant is more user friendly, it is still less accurate even for just a radar nowcast. The radar data will drift out of your original mask and now new data will come in to replace it.

I would propose to just merge this PR as is and only advect the mask when we use this smooth option, since this is the only case in which it actually is an issue.

@RubenImhoff RubenImhoff merged commit fa1c5d2 into master Nov 28, 2025
10 checks passed
@m-rempel
Copy link
Contributor

m-rempel commented Dec 4, 2025

Hi @mats-knmi, I agree with you that this can be less accurate for a combined forecast. It is definitely less accurate for a pure radar nowcast when one set no echo values instead of no data values. Visually and for subsequent forecast products in a process chain based on that extrapolations.

However, the question about accuracy isn't that clear when it comes to forecast verification. Do you focus about the forecast (extrapolation) quality within the entire observation domain or within a common domain of forecast (extrapolation) and observation. With the latter you may artificially increase the quality of that extrapolation.

Regarding a combined forecast, the main concern are inconsistencies at the domain edges in both cases when (a) observation and NWP output are valid on the same domain and (b) NWP output is on a larger domain than the observation.

For (a), ideally, the ensemble spread at the respective domain edges is sufficiently large to include NWP information even at a early forecast timestep. However, I confess that this is currently not the case. Ensemble spread is computed only on a more or less small subdomain, due to the Lien criterion. This only includes grid boxes where NWC as well as NWP forecast precipitation in at least ten ensemble members.

For (b), either one doesn't use any mask and rely only on the Kalman filter resulting in a strong increase of precipitation values outside the observation domain at a certain point in forecast time and may lead to strong inconsistencies when there is precipitation at the domain edges in observation. Or one work entirely on the observation domain and adds NWP information by such a linear blending at the domain edges.

Since we have case (b) and considering all these points, I have to admit that such a linear blending of NWP information at the edges of the advected observation domain makes perfect sense. With this, one uses all available information with a minimum of inconsistencies at the domain edges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants