# pypwsqc

This notebook shows...

The original R code stems from https://github.com/LottedeVos/PWSQC/. The current code is a variation of [this](https://github.com/NiekvanAndel/QC_radar/blob/main/script_1_0.py) Python implementation of the original R code.

The idea of ...

Below, we use open PWS dataset from Amsterdam, called the "AMS PWS" dataset.

In [1]:
import numpy as np
import poligrain as plg
import xarray as xr

import pypwsqc

## Get AMS PWS data

We use this NetCDF...

In [2]:
!curl -OL https://github.com/OpenSenseAction/OS_data_format_conventions/raw/main/notebooks/data/OpenSense_PWS_example_format_data.nc

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 5687k  100 5687k    0     0  3533k      0  0:00:01  0:00:01 --:--:-- 6341k


### Read data and select time of interest

In [2]:
ds_pws = xr.open_dataset("OpenSense_PWS_example_format_data.nc")
ds_pws = ds_pws.sel(time=slice("2017-09-13 00:00", "2017-09-13 03:00"))
ds_pws

### Project cooridnates from lon-lat to UTM zone for Europe

In [3]:
ds_pws.coords["x"], ds_pws.coords["y"] = plg.spatial.project_point_coordinates(
    x=ds_pws.longitude, y=ds_pws.latitude, target_projection="EPSG:25832"
)

### Calculate distance matrix

In [4]:
distance_matrix = plg.spatial.calc_point_to_point_distances(ds_pws, ds_pws)

### Calculate reference per station

This algorithm... Explain max_distance, what to set it to

In [5]:
%%time
ds_pws = ds_pws.load()

max_distance = 5e3
nbrs_not_nan = []
reference = []

for pws_id in ds_pws.id.data:
    neighbor_ids = distance_matrix.id.data[
        (distance_matrix.sel(id=pws_id) < max_distance)
        & (distance_matrix.sel(id=pws_id) > 0)
    ]

    N = ds_pws.rainfall.sel(id=neighbor_ids).isnull().sum(dim="id")
    nbrs_not_nan.append(N)

    median = ds_pws.sel(id=neighbor_ids).rainfall.median(dim="id")
    reference.append(median)

ds_pws["nbrs_not_nan"] = xr.concat(nbrs_not_nan, dim="id")
ds_pws["reference"] = xr.concat(reference, dim="id")

CPU times: total: 1.83 s
Wall time: 1.84 s


In [6]:
ds_pws

## Faulty Zeroes filter

Conditions for raising Faulty Zeroes flag:

* Median rainfall of neighbouring stations within range `max_distance` is larger than zero for at least `nint` time intervals while the station itself reports zero rainfall.

The FZ flag remains 1 until the station reports nonzero rainfall. For settings for parameter `nint`, see table 1 in https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019GL083731 

In [7]:
# what about nbrs_not_nan?
fz_flag = pypwsqc.flagging.fz_filter(
    pws_data=ds_pws.rainfall,
    reference=ds_pws.reference,
    nint=3,
)

In [15]:
ds_pws["fz_flag"] = (("id", "time"), fz_flag.astype(int))
ds_pws

## High Influx filter

Conditions for raising High Influx flag:

* If median below threshold `ϕA`, then high influx if rainfall above threshold `ϕB`
* If median above `ϕA`, then high influx if rainfall exceeds median times `ϕB`/`ϕA`

Filter cannot be applied if less than `nstat` neighbours are reporting data (HI flag is set to -1)

For settings for parameter `ϕA`, `ϕB` and `nstat`, see table 1 in https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019GL083731

In [20]:
hi_flag = pypwsqc.flagging.hi_filter(
    pws_data=ds_pws.rainfall,
    nbrs_not_nan=ds_pws.nbrs_not_nan,
    reference=ds_pws.reference,
    hi_thres_a=0.4,
    hi_thres_b=0.2,
    n_stat=5,
)

In [17]:
ds_pws["hi_flag"] = hi_flag

## Station Outlier filter

blablabla

In [7]:
# Set parameters
mint = 4032
mrain = 100
mmatch = 200
gamma = 0.15
beta = 0.2
nstat = 2