# Faulty Zeroes Filter
## QC protocol for Private Weather Stations

This notebook presents how to use the 'Faulty Zeroes filter' in the Python package `pypwsqc`, a quality assurance protocol developed for automated private weather stations (PWS). 

The package is based on the original R code available at https://github.com/LottedeVos/PWSQC/.

[Publication: de Vos, L. W., Leijnse, H., Overeem, A., & Uijlenhoet, R. (2019). Quality control for crowdsourced personal weather stations to enable operational rainfall monitoring. Geophysical Research Letters, 46(15), 8820-8829](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019GL083731)

`pypwsqc` depends on the `poligrain`, `xarray`, `pandas` and `numpy` packages. Make sure to install and import the required packages first.

In [None]:
import numpy as np
import poligrain as plg
import xarray as xr
import pypwsqc

## Download example data

In this example, we use an open PWS dataset from Amsterdam, called the "AMS PWS" dataset. By running the cell below, an example NetCDF-file will be downloaded to your current repository (if your machine is connected to the internet).

In [None]:
!curl -OL https://github.com/OpenSenseAction/OS_data_format_conventions/raw/main/notebooks/data/OpenSense_PWS_example_format_data.nc

## Data preparations

This package handles rainfall data as `xarray`  Datasets. The data set must have `time` and `id` dimensions, `latitude` and `longitude` as coordinates, and `rainfall` as data variable.

An example of how to convert .csv data to a `xarray` dataset is found [here](https://github.com/OpenSenseAction/OS_data_format_conventions/blob/main/notebooks/PWS_example_dataset.ipynb).

We now load the data set under the name  `ds_pws`.

In [None]:
ds_pws = xr.open_dataset("OpenSense_PWS_example_format_data.nc")

### Reproject coordinates 

First we reproject the coordinates to a local metric coordinate reference system to allow for distance calculations. In the Amsterdam example we use EPSG:25832. **Remember to use a local metric reference system for your use case!** We use the function `spatial.project_point_coordinates` in the `poligrain`package. 

In [None]:
ds_pws.coords["x"], ds_pws.coords["y"] = plg.spatial.project_point_coordinates(
    x=ds_pws.longitude, y=ds_pws.latitude, target_projection="EPSG:25832"
)

### Create distance matrix

Then, we calculate the distances between all stations in our data set. If your data set has a large number of stations this can take some time.

In [None]:
distance_matrix = plg.spatial.calc_point_to_point_distances(ds_pws, ds_pws)

### Select range for neighbouring checks

The quality control is performed by comparing time series of each station with the time series of neighbouring stations within a specificed range `max_distance`. The selected range depends on the use case and area of interest. In this example, we use 10'000 meters. `max_distance` is called `d` in the [original publication](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019GL083731).

### Select considered range around each station

In [None]:
max_distance = 10e3

## Quality control

Now the data set is prepared to run the quality control.

### Faulty Zeros filter

Conditions for raising Faulty Zeros flag:

* Median rainfall of neighbouring stations within range max_distance is larger than zero for at least nint time intervals while the station itself reports zero rainfall.
* The FZ flag remains 1 until the station reports nonzero rainfall.
* Filter cannot be applied if less than `n_stat` neighbours are reporting data (FZ flag is set to -1)
* NOTE! The filter cannot be applied if the station has reported NaN data in the last `nint` time steps. This gives more -1 flags than in the original R-implementation that does not use this condition. This choice was done to ensure that timesteps without data at the evaluated station is not mistakenly being interpreted as timesteps who have passed the quality control (if they would have been flagged with 0) or as time steps with a Faulty Zero issue (if they would have been flagged with 1).
  
For settings for parameter `nint` and `n_stat`, see table 1 in https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019GL083731

#### Set FZ parameters

In [None]:
nint = 6
n_stat = 5

#### Run FZ filter

In [None]:
%%time

ds_pws_filtered = pypwsqc.flagging.fz_filter(ds_pws, nint, n_stat, distance_matrix, max_distance)

### Save filtered data

In [None]:
ds_pws_filtered.to_netcdf("fz_filtered_dataset.nc")