# Fractions Skill Score (FSS)
For an explanation of the FSS, and implementation considerations,
see: [Fast calculation of the Fractions Skill Score][fss_ref]

[fss_ref]:
https://www.researchgate.net/publication/269222763_Fast_calculation_of_the_Fractions_Skill_Score

## FSS for a single 2D field
FSS is computed over 2D arrays representing the observations & forecasts in the spatial domain. The user has to make sure that the input dimensions correspond to the spatial domain e.g. `lat x lon`. Generally the computation involves sliding a window over the input field(s) and applying a threshold over the fcst and obs values.

The resulting binary field is summed up to represent the populace (number of ones/"true" values in the window).

The resulting 2-D field of rolling sums represents "_Integral Image_" of the respective forecast and obeservation fields, which is then aggregated over all the sliding windows to compute the fractions skill score.

The FSS is then roughly defined as:
```
    fss = 1 - sum_w((p_o - p_f)^2) / (sum_w(p_o^2) + sum_w(p_f^2))

    where,
    p_o: observation populace > threshold, in one window
    p_f: forecast populace > threshold, in one window
    sum_w: sum over all windows
````

The implementation details are beyond the scope of this tutorial please refer to, [Fast calculation of the Fractions Skill Score][fss_ref] for more info

In summary, computation of a single field requires the following parameters:
- forecast 2-D field (in spatial domain)
- observations 2-D field (in spatial domain)
- window size (width x height): The window size of the sliding window
- threshold: To compare the input fields against to generate a binary field
- compute method: (optional) currently only `numpy` is supported

[fss_ref]:
https://www.researchgate.net/publication/269222763_Fast_calculation_of_the_Fractions_Skill_Score

**1. Setup** 

First let's create some random data for our forecast and observation fields. Let's also try out a few scenarios:
```
scenario 1: obs distribution = fcst distribution = N(0, 1)
scenario 2: fcst distribution biased = N(1, 1)
scenario 3: fcst distribution variant = N(0, 2)

where N(mu, sigma) = normal distribution with mean = mu and standard deviation = sigma
```

In [None]:
import numpy as np

# specify input spatial dimensions
num_cols = 600
num_rows = 400

# set seed for reproducibility
np.random.seed(42)

# generate random fields
obs = np.random.normal(loc=0.0, scale=1.0, size = (num_rows, num_cols))
fcst_1 = np.random.normal(loc=0.0, scale=1.0, size = (num_rows, num_cols))
fcst_2 = np.random.normal(loc=1.0, scale=1.0, size = (num_rows, num_cols))
fcst_3 = np.random.normal(loc=0.0, scale=2.0, size = (num_rows, num_cols))

# summarize to sanity check the fields
_summarize = lambda x, field: print(
    "{: >20}: shape={}, mean={:.2f}, stddev={:.2f}".format(
    field, x.shape, np.mean(x), np.std(x)
))
_summarize(obs, "observations")
_summarize(fcst_1, "forecast scenario 1")
_summarize(fcst_2, "forecast scenario 2")
_summarize(fcst_3, "forecast scenario 3")


**2. Define inputs**

We need to now specify the threshold, window size and compute method. For now, lets choose a single window, and threshold. While the current `fss` method doesn't allow for more than 1 threshold and window definition per call, we'll see how calculate multiple thresholds/windows in a later step.

In [None]:
from scores.probability.fss_impl import FssComputeMethod
window_size = (100, 100)  # height * width or row size * col size
threshold = 0.5  # arbitrarily chosen
compute_method = FssComputeMethod.NUMPY  # Note: you can set this to None for now, since only NUMPY is supported currently

print("window_size:", window_size)
print("threshold:", threshold)
print("compute_method:", compute_method)

**3. Run FSS**

Since we only have spatial dims we'll be using `scores.probability.fss_impl.fss_2d_single_field` for this purpose.

In [None]:
from scores.probability.fss_impl import fss_2d_single_field

# compile scenarios
scenarios = {
    "scenario 1 (same distribution)": [obs, fcst_1],
    "scenario 2 (biased fcst)": [obs, fcst_2],
    "scenario 3 (variant fcst)": [obs, fcst_3],
}
result = []

# run through each scenario and compute FSS with inputs defined above
for s, v in scenarios.items():
    _obs, _fcst = v
    _fss = fss_2d_single_field(
        _fcst,
        _obs,
        threshold=threshold,
        window=window_size,
        compute_method=compute_method
    )
    result.append((s, _fss))

# tabulate results
print(f"{' '*30} | fss score")
print(f"{' '*30} | ---------")
_ = [print("{:<30} | {}".format(s, v)) for (s, v) in result]

As apparent above, with the same distribution, we get a score close to 1, this is because the FSS doesn't actually care about where in any given windows the binary fields match; only the total count. With a biased distribution the score dips a lot, which is expected with a threshold of 0.5 and a bias of 1.0. Whereas for a variant forecast, we still get a reasonable score, this is also expected since the variation isn't too large and the distributions still overlap quite a bit.

**4. Multiple inputs**

Suppose now that we want to collate data for multiple thresholds/windows. There are several ways of doing this, including vectorization. The following will show one way of doing it that, while more verbose, would hopefully help decompose the operations required to create the final accumulated dataset.

Now that we understand how the argument mapping works, let's re-create the mapping and run the fss, we'll also store the results in a `W x T x S` array before converting it to xarray for displaying.
```
W x T x S where,
W = number of windows
T = number of thresholds
S = number of scenarios
```

In [None]:
import xarray as xr
import itertools

# as before
window_sizes = np.linspace((20,20), (100,100), 5, dtype=(int,int))
thresholds = np.linspace(0.0, 1.0, 5, dtype=float)
input_scenarios = [[obs, fcst_1], [obs, fcst_2], [obs, fcst_3]]
compute_method = FssComputeMethod.NUMPY

# create output array
W = len(window_sizes)
T = len(thresholds)
S = len(input_scenarios)
fss_out = np.zeros((W, T, S))

# lets iterate over the indices, setting the `multi_index` and `writeonly` flags
with np.nditer(fss_out, flags=['multi_index'], op_flags=['writeonly']) as it:
    for _fss in it:
        window_idx, threshold_idx, input_idx = it.multi_index
        _window_size = window_sizes[window_idx]
        _threshold = thresholds[threshold_idx]
        _obs, _fcst = input_scenarios[input_idx]
        _fss[...] = fss_2d_single_field(
            _fcst,
            _obs,
            threshold=_threshold,
            window=_window_size,
            compute_method=compute_method
        )

# construct output xarray with results
da = xr.DataArray(
    data=fss_out,
    dims=["window_size", "threshold", "scenario"],
    coords=dict(
        window_size=[str(x) for x in window_sizes],
        threshold=[str(x) for x in thresholds],
        scenario=range(len(input_scenarios))
    ),
    attrs=dict(
        description="Fractions skill score",
    ),
)
da