# Part 5: Discussion and Interpretation of Results, Future Improvement Ideas

In this part we study the performance and trends of various forecastings and compare it to the observation. We have tried 1, 3 and 7 day lead prediction cases and developed three ML regression models. The setups are not exhaustively optimized in any front. That's why they are not discussed here except stating the high ranking input variables for their scientific relevance. So these results should not be taken as final, but as some interim observations that will hopefully help with future development. Some development ideas are discussed at the end. General notes:

- As expected, less the prediction lead was, higher the predictive performance is. The correlation between observed and predicted values were 99%, 94% and 91%, respectively. It is also noteworthy that best agreement was around the 200-300 cm, which is close to the quarter century average, thus most available training points were also from this regime. Extremes are most prone to accuracy loss when the lead increases. Even with the closest lead, we see that flood peaks are difficult to model.

<div style="display:flex; justify-content:center; gap:20px;">
  <figure style="text-align:center;">
    <img src="finalplots_1day/obs_pred_corr.png" 
         alt="Rhine catchment" 
         width="500" 
         style="border:2px solid black;">
    <figcaption style="font-style:italic; color:gray; margin-top:5px;">
      Figure 1a. 1 day
    </figcaption>
  </figure>

  <figure style="text-align:center;">
    <img src="finalplots_3day/obs_pred_corr.png" 
         alt="Sub-basins of Rhine" 
         width="500" 
         style="border:2px solid black;">
    <figcaption style="font-style:italic; color:gray; margin-top:5px;">
      Figure 1b. 3 day
    </figcaption>
  </figure>

  <figure style="text-align:center;">
    <img src="finalplots_7day/obs_pred_corr.png" 
         alt="Sub-basins of Rhine" 
         width="500" 
         style="border:2px solid black;">
    <figcaption style="font-style:italic; color:gray; margin-top:5px;">
      Figure 1c. 7 day
    </figcaption>
  </figure>
</div>

- Our original drive was to improve the underestimated January 2018 flood peak. Let's see how the ML model compares to observation and the ECMWF prediction. For the one day lead, our model has a similar earlier peak to the ECMWF, but its central value gets the height better. However, this advantage vanishes if we consider the quantiles, ECMWF model has very small 10-90% quantile, that is not even visible as a spread on the plot. Our light blue quantile plot corresponding to it brings much larger uncertainty thus we cannot claim to be getting the peak better. If we consider 40-60% quantile, we are better. In the 3-day lead case, we see the ECMWF spread as well. Both models shift a bit further towards the earlier day, however ML models shift is less than that of ECMWF. This becomes evident on 7-day lead, where ECMWF model's shift is so large that its peak is not visible anymore. ML model has a comparable shape to 3-day lead case, but with larger uncertainties. One issue that needs to be understood is, ML model seem to have less uncertainty in 1-day lead case compared to 3-day lead case, which shouldn't be the case in general.

<div style="text-align:center;">
  <figure>
    <img src="finalplots_1day/obs_pred_rfc_jan18.png" width="700" style="border:2px solid black;">
    <figcaption style="font-style:italic; color:gray; margin-top:5px;">
      Figure 1a. 1 day
    </figcaption>
  </figure>

  <figure>
    <img src="finalplots_3day/obs_pred_rfc_jan18.png" width="700" style="border:2px solid black;">
    <figcaption style="font-style:italic; color:gray; margin-top:5px;">
      Figure 1b. 3 day
    </figcaption>
  </figure>

  <figure>
    <img src="finalplots_7day/obs_pred_rfc_jan18.png" width="700" style="border:2px solid black;">
    <figcaption style="font-style:italic; color:gray; margin-top:5px;">
      Figure 1c. 7 day
    </figcaption>
  </figure>
</div>

- Let's also look at the full prediction range, year 2018. We see that ML model's central value outperforms ECMWF in 1-day and 3-day lead cases. But the difference diminishes with increasing time and predicting a week in advance is becoming very similar for both models. 10-90% quantile uncertainty is much more larger for ML model in all regimes. It should also be noted that our error estimation only considers the statistical variations in the training, whereas model input uncertainties are not considered. ECMWF model has full and robust error estimation already.

<div style="text-align:center;">
  <figure>
    <img src="finalplots_1day/obs_pred_rfc_full.png" width="700" style="border:2px solid black;">
    <figcaption style="font-style:italic; color:gray; margin-top:5px;">
      Figure 1a. 1 day
    </figcaption>
  </figure>

  <figure>
    <img src="finalplots_3day/obs_pred_rfc_full.png" width="700" style="border:2px solid black;">
    <figcaption style="font-style:italic; color:gray; margin-top:5px;">
      Figure 1b. 3 day
    </figcaption>
  </figure>

  <figure>
    <img src="finalplots_7day/obs_pred_rfc_full.png" width="700" style="border:2px solid black;">
    <figcaption style="font-style:italic; color:gray; margin-top:5px;">
      Figure 1c. 7 day
    </figcaption>
  </figure>
</div>

- When we look at the importance ranking of variables. We see the 1-day lead is dominated by the three middle rhine gauges. With longer forecasting periods, the gauges on the tributaries to Rhine become important such Rudew (Ruw) on Moselle and Heidelberg (Hdb) on Neckar. Non-pegel parameters do not rank high. Only mean temperature and precipitation appear on the list of top 20 ranked variables.

In [1]:
import pandas as pd
from IPython.display import display

# Read-in
cols = ["rank", "Feature", "Importance"]

df1 = pd.read_csv("finalplots_1day/feature_importance_top15.csv", names=cols, header=None).iloc[1:]
df2 = pd.read_csv("finalplots_3day/feature_importance_top15.csv", names=cols, header=None).iloc[1:]
df3 = pd.read_csv("finalplots_7day/feature_importance_top15.csv", names=cols, header=None).iloc[1:]

# Harmonise
for df in [df1, df2, df3]:
    df["rank"] = df["rank"].astype(int)
    df["Importance"] = df["Importance"].astype(float)

# Rename columns
df1 = df1.rename(columns={"Feature": "Param_1", "Importance": "Value_1"})
df2 = df2.rename(columns={"Feature": "Param_3", "Importance": "Value_3"})
df3 = df3.rename(columns={"Feature": "Param_7", "Importance": "Value_7"})

# Rank merge
df_merged = (
    df1.merge(df2, on="rank", how="outer")
       .merge(df3, on="rank", how="outer")
       .sort_values("rank")
)

df_merged = df_merged[df_merged["rank"] <= 20]

# Display
styled = (
    df_merged.style
    .set_caption("Feature Importance Comparison (1, 3, 7 day lead)")
    .background_gradient(cmap="Blues", subset=["Value_1", "Value_3", "Value_7"])
    .format(precision=3)
    .set_table_styles([{"selector":"caption",
                        "props":[("font-size","14pt"),
                                 ("text-align","center"),
                                 ("font-weight","bold")]}])
)

display(styled)

Unnamed: 0,rank,Param_1,Value_1,Param_3,Value_3,Param_7,Value_7
0,0,Bng_h,0.375,Hdb_h,0.241,Ruw_h,0.094
1,1,Kau_h,0.238,Bng_h,0.147,Hdb_h,0.068
2,2,Stg_h,0.181,Ruw_h,0.136,Ruw_h_lag1,0.049
3,3,Ruw_h,0.047,Kau_h,0.055,Bng_h_lag1,0.047
4,4,Ruw_h_lag1,0.032,Fra_h,0.042,Fra_h,0.038
5,5,Fra_h,0.024,Klk_h,0.025,Bng_h,0.028
6,6,Klk_h,0.017,sbs_prep_3,0.017,Stg_h,0.028
7,7,Hdb_h,0.012,Rhw_h,0.015,Kau_h,0.024
8,8,Sie_h,0.004,Ruw_diff,0.015,Bng_h_lag3,0.021
9,9,Ahr_h,0.003,sbs_prep_3_lag1,0.013,sbs_prep_3,0.014


## To-do's and improvement ideas

This study was to establish a working baseline framework for further, finer studies on the topic. It is far from being complete. Some ideas for future development are listed below:

- A better uncertainty estimation has to be made for the ML model, considering inputs as well
- Trying other ML architectures, particularly LSTM is proposed as a better functioning approach in the literature
- Better hyperparameter optimization: Currently we just did a quick study, without much time spent on tuning the parameters
- Change target: The target could be changed to discharge or relative change instead of absolute value of height/discharge
- Develop/add new input variables:
    - One important input variable missing is the soil moisture information. Also as we have noticed dominance of water gauge information in importance ranking, it would be natural to try expanding number of gauges used
    - Some variables can be better reformulated. For example instead of having average precipitation in sbs_1 area and sbs_1 area size a single variable could be tried by multiplying the two
    - Distance, elevation and slope informations regarding subbasins or water gauges might be helpful to add
    - We used water height, but discharge could also be used as input
    - Trying training only with non-gauge parameters and only-gauge parameters to select best subset of both
- Studying the sources of uncertainties in our model.
- Studying the Pegel prediction model of German BfG. Apparently they have their own models but past prediction data is not available online for study and comparisons