# Bias correction of weather forecasts
The use of weather forecasts generated by climeate models in hydrology has some methodological implications. Generally, this type of weather forecasts, such as seasonal forecast, are produced for coarse grid resolutions, which can lead to systematic errors or bias. Therefore, bias correction is a recommeded practice before using weather forecasts to simulate streamflows, particularly when the catchment size is of similar size or smaller than the grid scale ([Crochemore et al., 2016](https://www.hydrol-earth-syst-sci.net/20/3601/2016/)).

This notebook applies a linear scaling approach to correct the bias of seasonal forecasts, in this case the [ECMWF SEAS5](https://www.ecmwf.int/en/newsletter/154/meteorology/ecmwfs-new-long-range-forecasting-system-seas5) rainfall, temperature and evaporation ensembles.

## 1. Import general libraries
First, we need to import the necessary libraries and sub-modules. **Only if iRONs is run locally**: since three required libraries, [plotly](https://plot.ly/), [Netcdf4](https://pypi.org/project/netCDF4/) and [Numba](http://numba.pydata.org/) are not available on Anaconda by default, you must have installed them first. Help on how to install such libraries is given here: [How to install libraries](../../A%20-%20Knowledge%20transfer/0%20-%20Tutorials/0.b%20-%20How%20to%20install%20libraries.ipynb). If iRONs is run on the cloud, e.g. on [Binder](https://mybinder.org/) or [Microsoft Azure Notebooks](https://notebooks.azure.com/), we do not need to install the libraries to import them. 

Once all the necessary libraries are installed locally or in the case that we are running iRONs on the cloud, we can import them with the following code:

In [62]:
import numpy as np
import pandas as pd
import plotly.graph_objs as go
from netCDF4 import Dataset # to extract data from NetCDF files (format of the downloaded ECMWF files)
import numba # a just-in-time compiler to speed up the code, in particular loops
import os
import sys
# To import your own submodules
sys.path.append('../Submodules')
from cum2inst import cum2inst
from Read_data import read_csv_data,read_netcdf_data
from Bias_correction import linear_scaling

## 2. Initial options
Second, we define the originating centre of the forecast, the initial date of the forecast and the area of the catchment.

In [63]:
origin_centre = 'ECMWF' # forecast originating centre
file_format = 'netcdf'
path_fore_data = origin_centre+' data//'+file_format+' files'
year     = 2014   # year
month    = 11     # month
day      = 1      # day
name_fore_file_end = "_1d_7m_"+origin_centre+"_Temp_Evap_Rain.nc"
name_fore_file = str(year)+str(month).zfill(2)+str(1).zfill(2)+name_fore_file_end
catchment_area = 28.8 # km2

## 3. Load data and define bias correction
### 3.1 Forecast data
The downloaded forecast files are in [NetCDF](https://confluence.ecmwf.int/display/CKB/What+are+NetCDF+files+and+how+can+I+read+them) (Network Common Data Form) format. This file format supports the creation, access, and sharing of array-oriented scientific data.
Here we will extract temperature, evaporation and rainfall data using their corresponding short names: 't2m', 'e' and 'tp' respectively. You can find a complete list of weather parameters with their corresponding short names in this [link](https://apps.ecmwf.int/codes/grib/param-db/?filter=netcdf).

#### 3.1.1 Extract temperature data (temperature at 2m over the surface: 't2m')
Original data is in degK

In [64]:
dates_fore,Temp_fore = read_netcdf_data(path_fore_data,name_fore_file,'t2m')
# Spatially averaged data and converted into degC
Temp_fore_ens = Temp_fore.mean(3).mean(2)-273.15

#### 3.1.2 Extract evaporation data (evaporation: 'e')
Original data is in daily cumulative metres so we use the sub-module **cum2inst** to transform cumulative data into instantaneous.

Transformation of the daily cumulative milimetres into daily metres

In [65]:
dates_fore,Evap_fore = read_netcdf_data(path_fore_data,name_fore_file,'e')
# Spatially averaged data and coverted into mm
Evap_fore_cum = -Evap_fore.mean(3).mean(2)*1000
# Cumulative to instantaneous data
Evap_fore_ens = cum2inst(Evap_fore_cum)

#### 3.1.3 Extract rainfall data (total precipitation: 'tp')
Original data is in daily cumulative meters.

Transformation of the daily cumulative milimetres into daily metres

In [66]:
dates_fore,Rain_fore = read_netcdf_data(path_fore_data,name_fore_file,'tp')
# Spatially averaged data and coverted into mm
Rain_fore_cum = Rain_fore.mean(3).mean(2)*1000
# Cumulative to instantaneous data
Rain_fore_ens = cum2inst(Rain_fore_cum)

#### 3.1.4 Number of months covered by the forecast

In [67]:
if month>dates_fore[-1].month:
    num_months = dates_fore[-1].month - month + 1 + 12
else:
    num_months = dates_fore[-1].month - month + 1

#### 3.1.5 Plot the extracted data (in this case the rainfall forecast ensemble)

In [68]:
# Number of members of the forecast ensemble 
num_mem = Rain_fore_ens.shape[1]
layout_1 = go.Layout(title  = 'rainfall forecast ensemble (ECMWF) - from '+dates_fore[0].strftime("%d/%m/%Y")+
                     ' to '+dates_fore[-1].strftime("%d/%m/%Y"),
                     xaxis  = dict(showgrid=False,title = 'days'),
                     yaxis  = dict(title = 'mm/day'),
                     width  = 900, 
                     height = 450,
                    showlegend = True)
fig_1 = go.Figure(layout = layout_1)
# We define the traces (layers of data)

for i in range(num_mem):
    fig_1.add_trace(go.Scatter(x = dates_fore,
                          y       = Rain_fore_ens[:,i],
                          name    = 'member '+str(i), 
                          line    = dict(color='blue', width=1),
                              opacity = 0.2))
fig_1.add_trace(go.Scatter(x = dates_fore,
                          y       = np.mean(Rain_fore_ens, axis =1),
                          name    = 'mean', 
                          line    = dict(color='blue', width=2),
                              opacity = 1))
fig_1

### 3.2 Observed weather data
In order to bias correct the forecast, we need to estimate the bias, i.e. difference between the forecast and the observations. For this purpose, we load available observed weather data from our catchment of study.

In [69]:
# File path
path_obs_data = 'Historical data'
name_obs_file = 'Hist_clim_data.csv'

In [70]:
dates_obs,Temp_obs = read_csv_data(path_obs_data,name_obs_file,'Temp')
dates_obs,Rain_obs = read_csv_data(path_obs_data,name_obs_file,'Rain')
dates_obs,PET_obs  = read_csv_data(path_obs_data,name_obs_file,'PET')

Plotting the available observed data (rainfall)

In [71]:
layout_2 = go.Layout(title  = 'observed daily rainfall - from '+dates_obs[0].strftime("%d/%m/%Y")+
                     ' to '+dates_obs[-1].strftime("%d/%m/%Y"),
                     xaxis  = dict(showgrid=False,title = 'days'),
                     yaxis  = dict(title = 'mm/day'),
                     width  = 900, 
                     height = 450,
                    showlegend = False)
fig_2 = go.Figure(layout = layout_2)
# We define the traces (layers of data)
fig_2.add_trace(go.Scatter(x       = dates_obs,
                           y       = Rain_obs,
                           name    = None, 
                           line    = dict(color='blue', width=1),
                           opacity = 1))
fig_2

### 3.3 Control data generation (ECMWF seasonal hindcasts) + forecast bias correction
Here we are going to call a function to apply a linear scaling bias correction to our weather forecasts. This approach is simple and still often with similar results to more sophisticated ones such as the quantile or distribution mapping ([Crochemore et al., 2016](https://www.hydrol-earth-syst-sci.net/20/3601/2016/)). A bias monthly correction factor is calculated considering the ratio (between the observed and the forecast (ensemble mean) values. The correction factor obtained is then applied as a multiplicative factor to correct raw daily forecast value.  A different factor is calculated and applied for each month and each year of the evaluation period. For example, for November 2005 we obtain the correction factor as the ratio between the mean observed rainfall in November from 1981 until 2004 (i.e. the average of 24 values) and the mean forecasted rainfall for the same months (i.e. the average of 24x25 values, as we have 25 ensemble members). For November 2006, we re-calculate the correction factor by also including the observations and forecasts of November 2005, hence taking averages over 25 values; and so forth. 
#### 3.3.1 Temperature

In [72]:
Temp_fore_corr = linear_scaling(path_fore_data,name_fore_file_end,
                                dates_fore,Temp_fore_ens,
                                dates_obs,Temp_obs,
                                't2m')

#### 3.3.2 Evaporation

In [73]:
Evap_fore_corr = linear_scaling(path_fore_data,name_fore_file_end,
                                dates_fore,Evap_fore_ens,
                                dates_obs,PET_obs,
                                'e')

#### 3.3.3 Rainfall

In [74]:
Rain_fore_corr = linear_scaling(path_fore_data,name_fore_file_end,
                                dates_fore,Rain_fore_ens,
                                dates_obs,Rain_obs,
                                'tp')

## 4. Plot results
### 4.1 Temperature
#### Plot the forecast ensemble: before bias correction vs after bias correction

In [75]:
layout_3 = go.Layout(title  = 'temperature forecast ensemble (ECMWF) - from '+dates_fore[0].strftime("%d/%m/%Y")+
                     ' to '+dates_fore[-1].strftime("%d/%m/%Y"),
                     xaxis  = dict(showgrid=False,title = 'days'),
                     yaxis  = dict(title = 'degC'),
                     width  = 900, 
                     height = 450,
                    showlegend = True)
fig_3 = go.Figure(layout = layout_3)
# We define the traces (layers of data)

for i in range(num_mem):
    fig_3.add_trace(go.Scatter(x = dates_fore,
                          y       = Temp_fore_ens[:,i],
                          name    = 'before BC '+str(i), 
                          line    = dict(color='blue', width=1),
                              opacity = 0.2))
    fig_3.add_trace(go.Scatter(x = dates_fore,
                          y       = Temp_fore_corr[:,i],
                          name    = 'after BC '+str(i), 
                          line    = dict(color='red', width=1),
                              opacity = 0.2))
fig_3

#### Plot the average forecast: before bias correction vs after bias correction

In [76]:
Temp_fore_month_mean = np.zeros(Temp_fore_ens.shape[0])
Temp_fore_corr_month_mean = np.zeros(Temp_fore_ens.shape[0])
for i in range(num_months):
    if month+i<=12:
        m = month+i
    else:
        m = month+i-12
    ID_month = np.where((dates_fore.month==m))[0]
    Temp_fore_month_mean [ID_month] = np.mean(Temp_fore_ens[ID_month])
    Temp_fore_corr_month_mean [ID_month] = np.mean(Temp_fore_corr[ID_month])

Horizontal lines represent the monthly average value

In [77]:
fig_4 = go.Figure(layout = layout_3)
# We define the traces (layers of data)
fig_4.add_trace(go.Scatter(x      = dates_fore,
                          y       = np.mean(Temp_fore_ens, axis =1),
                          name    = 'mean - before BC', 
                          line    = dict(color='blue', width=2),
                              opacity = 1))
fig_4.add_trace(go.Scatter(x      = dates_fore,
                          y       = Temp_fore_month_mean,
                          name    = 'monthly mean - after BC', 
                          line    = dict(color='blue', width=1,dash='dot'),
                              opacity = 1))
fig_4.add_trace(go.Scatter(x      = dates_fore,
                          y       = np.mean(Temp_fore_corr, axis =1),
                          name    = 'mean - after BC', 
                          line    = dict(color='red', width=2),
                              opacity = 1))
fig_4.add_trace(go.Scatter(x      = dates_fore,
                          y       = Temp_fore_corr_month_mean,
                          name    = 'monthly mean - after BC', 
                          line    = dict(color='red', width=1,dash='dot'),
                              opacity = 1))
fig_4

### 4.2 Evaporation
#### Plot the forecast ensemble: before bias correction vs after bias correction

In [78]:
layout_5 = go.Layout(title  = 'evaporation forecast ensemble (ECMWF) - from '+dates_fore[0].strftime("%d/%m/%Y")+
                     ' to '+dates_fore[-1].strftime("%d/%m/%Y"),
                     xaxis  = dict(showgrid=False,title = 'days'),
                     yaxis  = dict(title = 'mm/day'),
                     width  = 900, 
                     height = 450,
                    showlegend = True)
fig_5 = go.Figure(layout = layout_5)
# We define the traces (layers of data)

for i in range(num_mem):
    fig_5.add_trace(go.Scatter(x = dates_fore,
                          y       = Evap_fore_ens[:,i],
                          name    = 'before BC '+str(i), 
                          line    = dict(color='blue', width=1),
                              opacity = 0.2))
    fig_5.add_trace(go.Scatter(x = dates_fore,
                          y       = Evap_fore_corr[:,i],
                          name    = 'after BC '+str(i), 
                          line    = dict(color='red', width=1),
                              opacity = 0.2))
fig_5

In [79]:
Evap_fore_month_mean = np.zeros(Evap_fore_ens.shape[0])
Evap_fore_corr_month_mean = np.zeros(Evap_fore_ens.shape[0])
for i in range(num_months):
    if month+i<=12:
        m = month+i
    else:
        m = month+i-12
    ID_month = np.where((dates_fore.month==m))[0]
    Evap_fore_month_mean [ID_month] = np.mean(Evap_fore_ens[ID_month])
    Evap_fore_corr_month_mean [ID_month] = np.mean(Evap_fore_corr[ID_month])

#### Plot the average forecast: before bias correction vs after bias correction
Horizontal lines represent the monthly average value

In [80]:
fig_6 = go.Figure(layout = layout_5)
# We define the traces (layers of data)
fig_6.add_trace(go.Scatter(x      = dates_fore,
                          y       = np.mean(Evap_fore_ens, axis =1),
                          name    = 'mean - before BC', 
                          line    = dict(color='blue', width=2),
                              opacity = 1))
fig_6.add_trace(go.Scatter(x      = dates_fore,
                          y       = Evap_fore_month_mean,
                          name    = 'monthly mean - after BC', 
                          line    = dict(color='blue', width=1,dash='dot'),
                              opacity = 1))
fig_6.add_trace(go.Scatter(x      = dates_fore,
                          y       = np.mean(Evap_fore_corr, axis =1),
                          name    = 'mean - after BC', 
                          line    = dict(color='red', width=2),
                              opacity = 1))
fig_6.add_trace(go.Scatter(x      = dates_fore,
                          y       = Evap_fore_corr_month_mean,
                          name    = 'monthly mean - after BC', 
                          line    = dict(color='red', width=1,dash='dot'),
                              opacity = 1))
fig_6

### 4.3 Rainfall
#### Plot the forecast ensemble: before bias correction vs after bias correction

In [81]:
layout_7 = go.Layout(title  = 'rainfall forecast ensemble (ECMWF) - from '+dates_fore[0].strftime("%d/%m/%Y")+
                     ' to '+dates_fore[-1].strftime("%d/%m/%Y"),
                     xaxis  = dict(showgrid=False,title = 'days'),
                     yaxis  = dict(title = 'mm/day'),
                     width  = 900, 
                     height = 450,
                    showlegend = True)
fig_7 = go.Figure(layout = layout_7)

# We define the traces (layers of data)
for i in range(num_mem):
    fig_7.add_trace(go.Scatter(x = dates_fore,
                          y       = Rain_fore_ens[:,i],
                          name    = 'before BC '+str(i), 
                          line    = dict(color='blue', width=1),
                              opacity = 0.2))
    fig_7.add_trace(go.Scatter(x = dates_fore,
                          y       = Rain_fore_corr[:,i],
                          name    = 'after BC '+str(i), 
                          line    = dict(color='red', width=1),
                              opacity = 0.2))
fig_7

In [82]:
Rain_fore_month_mean = np.zeros(Rain_fore_ens.shape[0])
Rain_fore_corr_month_mean = np.zeros(Rain_fore_ens.shape[0])
for i in range(num_months):
    if month+i<=12:
        m = month+i
    else:
        m = month+i-12
    ID_month = np.where((dates_fore.month==m))[0]
    Rain_fore_month_mean [ID_month] = np.mean(Rain_fore_ens[ID_month])
    Rain_fore_corr_month_mean [ID_month] = np.mean(Rain_fore_corr[ID_month])

#### Plot the average forecast: before bias correction vs after bias correction
Horizontal lines represent the monthly average value

In [83]:
fig_8 = go.Figure(layout = layout_7)
# We define the traces (layers of data)
fig_8.add_trace(go.Scatter(x      = dates_fore,
                          y       = np.mean(Rain_fore_ens, axis =1),
                          name    = 'mean - before BC', 
                          line    = dict(color='blue', width=2),
                              opacity = 1))
fig_8.add_trace(go.Scatter(x      = dates_fore,
                          y       = Rain_fore_month_mean,
                          name    = 'monthly mean - after BC', 
                          line    = dict(color='blue', width=1,dash='dot'),
                              opacity = 1))
fig_8.add_trace(go.Scatter(x      = dates_fore,
                          y       = np.mean(Rain_fore_corr, axis =1),
                          name    = 'mean - after BC', 
                          line    = dict(color='red', width=2),
                              opacity = 1))
fig_8.add_trace(go.Scatter(x      = dates_fore,
                          y       = Rain_fore_corr_month_mean,
                          name    = 'monthly mean - after BC', 
                          line    = dict(color='red', width=1,dash='dot'),
                              opacity = 1))
fig_8

## 5. Save bias corrected data to a file (.csv)
### 5.1 Temperature

In [84]:
Fore_Temp = pd.DataFrame(Temp_fore_corr)
Fore_Temp.insert(0,'Date',dates_fore.strftime('%d/%m/%Y'))
Fore_Temp.to_csv('Results/'+origin_centre+'/Fore_Temp.csv',index = None)

### 5.2 Evaporation

In [85]:
Fore_Evap = pd.DataFrame(Evap_fore_corr)
Fore_Evap.insert(0,'Date',dates_fore.strftime('%d/%m/%Y'))
Fore_Evap.to_csv('Results/'+origin_centre+'/Fore_Evap.csv',index = None)

### 5.3 Rainfall

In [86]:
Fore_Rain = pd.DataFrame(Rain_fore_corr)
Fore_Rain.insert(0,'Date',dates_fore.strftime('%d/%m/%Y'))
Fore_Rain.to_csv('Results/'+origin_centre+'/Fore_Rain.csv',index = None)

#### Let's go to the next section: [Generation of reservoir inflow ensemble forecasts from weather forecasts](../2%20-%20Inflow%20forecast/2.a%20Generation%20of%20reservoir%20inflow%20ensemble%20forecasts%20from%20weather%20forecasts.ipynb)

## References

Crochemore, L., Ramos, M.-H., and Pappenberger, F.: Bias correcting precipitation forecasts to improve the skill of seasonal streamflow forecasts, Hydrol. Earth Syst. Sci., 20, 3601–3618, https://doi.org/10.5194/hess-20-3601-2016, 2016.