# Bias Correction

In order to use the bias correction tools in the `geoglows` package, you need 3 things. 

1. Observed streamflow data
2. Simulated historical streamflow data from the `geoglows` model
3. Data to correct: either the historical data, or any other timeseries of simulated flows from the `geoglows` model

The simulated historical and predicted streamflow are available through the GEOGloWS ECMWF Streamflow model via the geoglows python package.

Methods for recording streamflow and formats to save them in vary by country and not all streamflow is publically available online. As such, there is not a generic tool for retrieving observed streamflow through the geoglows package. You will need to provide it yourself.

Lets start by installing the geoglows tools in this notebook environment and importing some other dependencies we'll need

In [None]:
# Start by installing the package and importing it to your code. Run this cell to do that.
!pip install geoglows==0.18.3
import geoglows
from IPython.core.display import display, HTML
import pandas as pd
from google.colab import files

## Step 1: Upload a csv on your computer for Bias Correction

If you have a copy of a file on your machine you would like to use for bias correction, you can upload it to this notebook environment with the Google Collaboratory `files.upload()` feature. Please note, this file is not saved once you leave this page. If you revisit this page, you will need to upload again.

Your csv should have 2 columns and both should have names as the first item. The first one should be titled `datetime` and contain dates in a standard format. The other may have any title but ***must*** contain streamflow values in cubic meters per second (m^3/s)

OPTIONAL IF YOU NEED DEMO DATA, DOWNLOAD THIS CSV

https://www.hydroshare.org/resource/d222676fbd984a81911761ca1ba936bf/data/contents/Discharge_Data/23187280.csv

In [None]:
uploaded = files.upload()
for fn in uploaded.keys():
  uploaded_file_name = fn
  print(f'User uploaded file "{fn}"')

In [None]:
observed_data = pd.read_csv(uploaded_file_name, index_col=0)
observed_data.index = pd.to_datetime(observed_data.index).tz_localize('UTC')
print('Here is a preview of your data')
print(observed_data.head(10))

## Step 2: Get the historical data for correction

Hydrologic models number the streams they simulate so that the results can be stored and organized. The GEOGloWS ECMWF model refers to these as `reach_id's` and has an interface for finding them programatically by providing a latitude and longitude.

Use the latitude and longitude of your stream gauge to find the model's  reporting point closest to your gauge for comparison.

OPTIONAL: IF YOU DOWNLOADED THE DEMO DATA, USE THIS LAT/LON PAIR  
latitude = 7.81179264  
longitude = -73.8105294

In [None]:
# Edit this cell with the latitude and longitude of your reporting point
latitude = 7.81179264
longitude = -73.8105294

In [None]:
# This function performs some geoprocessing and may take a few seconds to complete
reach_id = geoglows.streamflow.latlon_to_reach(latitude, longitude)['reach_id']
print(reach_id)

In [None]:
historical_data = geoglows.streamflow.historic_simulation(reach_id)

## Step 3: Get other Forecasted Data for correction

We will use the same reach_id as for the historical data to retrieve forecasted streamflow

In [None]:
stats = geoglows.streamflow.forecast_stats(reach_id)
ensembles = geoglows.streamflow.forecast_ensembles(reach_id)
records = geoglows.streamflow.forecast_records(reach_id)

## Step 4: Perform the bias correction

Use the `geoglows.bias` tools to correct the bias using your observed data

In [None]:
corrected_historical = geoglows.bias.correct_historical(historical_data, observed_data)
corrected_stats = geoglows.bias.correct_forecast(stats, historical_data, observed_data)
corrected_ensembles = geoglows.bias.correct_forecast(ensembles, historical_data, observed_data)
corrected_records = geoglows.bias.correct_forecast(records, historical_data, observed_data, use_month=-1)

## Step 5: Plot the results

In [None]:
# You can add more entries to the dicionary and they will appear in the title of the graph
titles = {'Reach ID': reach_id, 'bias_corrected': True}

### Historical Data
Use the legend on the right of the plot to toggle on/off different layers

In [None]:
# This is a plot of the Original Simulated, Corrected Simulated, and Observed data
geoglows.plots.corrected_historical(corrected_historical, historical_data, observed_data, titles=titles).show()

### Forecasted Data

Since there so many lines on the forecast plots, we recommend plotting the forecasts side by side rather than overlaying them all.

In [None]:
# corrected data
geoglows.plots.forecast_stats(corrected_stats, titles=titles).show()

In [None]:
# original data
geoglows.plots.forecast_stats(stats).show()

## Step 6: Statistics, Summaries, Averages, etc

There are many tools in the geoglows package to analyze how much the bias correction improved the streamflow simulations. These are based on the statistical analysis performed by the `hydrostats` and `HydroErr` python packages

In [None]:
# This is a scatter plot of the original vs simulated data
geoglows.plots.corrected_scatterplots(corrected_historical, historical_data, observed_data, titles=titles).show()

In [None]:
# This is a plot of the monthly averages
geoglows.plots.corrected_month_average(corrected_historical, historical_data, observed_data, titles=titles).show()

In [None]:
# This is a plot of the daily averages
geoglows.plots.corrected_day_average(corrected_historical, historical_data, observed_data, titles=titles).show()

In [None]:
# This is a plot of the cumulative annual volumes
geoglows.plots.corrected_volume_compare(corrected_historical, historical_data, observed_data, titles=titles).show()

In [None]:
# This is a table of a few important statistics 
display(HTML(geoglows.bias.statistics_tables(corrected_historical, historical_data, observed_data)))

## Optional: Download your corrected results as CSV

In [None]:
corrected_historical.to_csv('corrected_historical_streamflow.csv')
corrected_stats.to_csv('corrected_forecasted_stats.csv')
corrected_ensembles.to_csv('corrected_forecasted_ensembles.csv')
files.download('corrected_historical_streamflow.csv')
files.download('corrected_forecasted_stats.csv')
files.download('corrected_forecasted_ensembles.csv')