# Bias correction of Personal Weather Stations


This notebook presents how to use calculate Bias Correction Factors (BCF) for automated private weather stations (PWS) with the Python package `pypwsqc`. The package is based on the original R code available at https://github.com/LottedeVos/PWSQC/. In its original implementation, the functionality is embedded in the Station Outlier filter. In this notebook, the bias correction can be performed separately. It is recommended to apply the other QC filters first and only calculate BCF on filtered data. The BCF can also be calculated simultaneously as the Station Outlier filter. If you want to do that, use the Station Outlier filter notebook and set the variable `bias_corr` to _True_.

[Publication: de Vos, L. W., Leijnse, H., Overeem, A., & Uijlenhoet, R. (2019). Quality control for crowdsourced personal weather stations to enable operational rainfall monitoring. Geophysical Research Letters, 46(15), 8820-8829](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019GL083731)

`pypwsqc` depends on the `poligrain`, `xarray`, `pandas` and `numpy` packages. Make sure to install and import the required packages first.

In [None]:
import poligrain as plg
import xarray as xr
import pypwsqc

## Download example data

In this example, we use an open PWS dataset from Amsterdam, called the "AMS PWS" dataset. By running the cell below, an example NetCDF-file will be downloaded to your current repository (if your machine is connected to the internet).

In [None]:
!curl -OL https://github.com/OpenSenseAction/OS_data_format_conventions/raw/main/notebooks/data/OpenSense_PWS_example_format_data.nc

## Data preparations

This package handles rainfall data as `xarray`  Datasets. The data set must have `time` and `id` dimensions, `latitude` and `longitude` as coordinates, and `rainfall` as data variable.

An example of how to convert .csv data to a `xarray` dataset is found [here](https://github.com/OpenSenseAction/OS_data_format_conventions/blob/main/notebooks/PWS_example_dataset.ipynb).

We now load the data set under the name  `ds_pws`.

In [None]:
ds_pws = xr.open_dataset("OpenSense_PWS_example_format_data.nc")
ds_pws

### Reproject coordinates 

First we reproject the coordinates to a local metric coordinate reference system to allow for distance calculations. In the Amsterdam example we use EPSG:25832. **Remember to use a local metric reference system for your use case!** We use the function `spatial.project_point_coordinates` in the `poligrain`package. 

In [None]:
ds_pws.coords["x"], ds_pws.coords["y"] = plg.spatial.project_point_coordinates(
    x=ds_pws.longitude, y=ds_pws.latitude, target_projection="EPSG:25832"
)

### Create distance matrix

Then, we calculate the distances between all stations in our data set. If your data set has a large number of stations this can take some time.

In [None]:
distance_matrix = plg.spatial.calc_point_to_point_distances(ds_pws, ds_pws)

### Select range for neighbouring checks

The quality control is performed by comparing time series of each station with the time series of neighbouring stations within a specificed range `max_distance`. The selected range depends on the use case and area of interest. In this example, we use 10'000 meters. `max_distance` is called `d` in the [original publication](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019GL083731).

### Select considered range around each station

In [None]:
max_distance = 10e3

## Bias correction

Now the data set is prepared to calculate bias correction factors. 

A bias correction factor `bias_corr_factor` will be calculated per time step over a rolling window of length `evaluation_period` time steps. Define the window length by defining the variable `evaluation_period` in the cell below. The default is to use the median rainfall of the neighboring stations as reference. To use another data source as reference, that data must be added as a variable named `reference` to the xarray data set. [Here](https://github.com/OpenSenseAction/OS_data_format_conventions/blob/main/notebooks/PWS_example_dataset.ipynb) you can find an example of how to construct an xarray data set. `beta` is a bias correction parameter (default 0.2). `dbc` is the default bias correction factor (default 1). 

In [None]:
evaluation_period = 8064
beta = 0.2
dbc = 1

In [None]:
%%time

ds_pws_flagged = pypwsqc.flagging.bias_correction(
    ds_pws,
    evaluation_period,
    distance_matrix,
    max_distance,
    beta,
    dbc,
)

### Save flagged dataset

In [None]:
ds_pws_flagged.to_netcdf("biascorr_dataset.nc")