In [1]:
%load_ext watermark
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.colors
from matplotlib.colors import LinearSegmentedColormap, ListedColormap
import seaborn as sns

import session_config
import reports
import geospatial
import userdisplay as disp
from myst_nb import glue
from IPython.display import display, Markdown

# available data
surveys = session_config.collect_survey_data()



# Use cases



## Examples

### The probability of finding one object

Here we consider the following question:

1. What are the chances of finding at least one _O_ if I go to the beach at _C_ ?

Where _O_ is some object of interest that is in the list of items identified on the beach (there are 229 options) and C is a lake or municipality on a lake.

This example was first tested in November 2021 at the request of members from an environmental organization that was visiting Lake Geneva, [finding one object](https://hammerdirt-analyst.github.io/finding-one-object/titlepage.html). This value was initially expected to be approximately 40%. The method of calculation was the Beta-Binomial conjugate pair. Instead of considering all the values on the grid we consider only two results: was the number found greater than zero or not. From the general form in (1) we get:

> What is the chance of finding at least one feminine hygiene product at the beach on Lac Léman ?


````{tab-set}

```{tab-item} Steps to complete the calculation

1. identify the codes for the items of interest: `G96` and `G144`
2. define the region of interest: `lac-leman`
3. define the date range of the likelihood : `{'start':'2020-01-01', 'end':'2021-11-01'}`
4. define the date range of the prior :  `{'start':'2015-11-15', 'end':'2019-12-31'}`

### The likelihood and prior

The likelihood data is defined as all the data collected durring the current sampling campaign, up to one week before the planned event in Geneva. The prior data is all collected in the previous sampling campaigns, not including results from locations in the likelihod. In both cases we are considering only the codes G96 and G144.



```

```{tab-item} Default parameters and methods

__Default parameters__

1. range of the default index $X = \{ x \in \mathbb{R} \mid x = 0.1k, \; k \in \mathbb{Z}, \; 0 \leq x < 100 \}$
   * or `np.arange(0, 100, 0.1)`
2. Max range of forecast grid = $\max_{i} \{ x_i \} \text{ or } P_{99} = \text{percentile}_{99} \{ x_i \}$
3. The magnitude of the land use for each survey location is categorized in the following manner:

$$
\text{binning}(x) = 
\begin{cases} 
1 & \text{if } -1 \leq x < 0.2 \\
2 & \text{if } 0.2 \leq x < 0.4 \\
3 & \text{if } 0.4 \leq x < 0.6 \\
4 & \text{if } 0.6 \leq x < 0.8 \\
5 & \text{if } 0.8 \leq x \leq 1 
\end{cases},
\text{ where x is the \% of land occupied by a land-use feature } 
$$

__Distributions__

The posterior distribution is $P(\text{Likelihood} \mid \text{Prior}) \approx \text{Dirichlet}(\alpha)$ or more commonly: $P(\theta \mid \mathbf{X}) \approx  text{Dirichlet}(\alpha + \mathbf{n})$

1. $\theta$ is the parameters of the Dirichlet distribution
2. $\mathbf{X}$ is the observed data
3. $\alpha$ is the parameters of the prior Dirichlet distribution
4. $\mathbf{n}$ is the count data from the likelihood

__Forecasted samples__

$$
\begin{align*}
\theta &\sim \text{Dirichlet}(\alpha) \\
\mathbf{X} \mid \theta &\sim \text{Multinomial}(N, \theta)
\end{align*}
$$
```
````