In [1]:
%matplotlib inline

import sys
sys.path.append('../')

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

from water import data
from water.names import BEACHES
from water.viz import plot_exceedances, plot_all_years_site

sns.set_style('darkgrid')
sns.set(rc={'figure.figsize':(13, 9), 'figure.max_open_warning': 50})
sns.set_context('talk')

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

In [2]:
df = data.load()

In [3]:
mean_exceedance_lengths = {}
for site_name in BEACHES:
    lengths = []
    site_df = df[site_name].dropna()
    for year in site_df.index.year.unique():
        site_year_df = site_df[(site_df.index >= str(year)) & (site_df.index < str(year + 1))]
        in_exceedance = False
        for i in range(site_year_df.shape[0]):
            if in_exceedance:
                if site_year_df.iloc[i] < 1000:
                    in_exceedance = False
                    lengths.append((site_year_df.index[i] - start_date).days)
            if site_year_df.iloc[i] >= 1000:
                in_exceedance = True
                start_date = site_year_df.index[i]
    if not lengths:
        print(site_name, 'has no exceedances.')
        continue
    num_equal_1 = sum([l == 1 for l in lengths])
    mean_exceedance_lengths[site_name] = [
        num_equal_1 / len(lengths),
        np.mean(lengths),
        len(lengths)
    ]

fast_test_df = (
    pd.DataFrame(mean_exceedance_lengths)
    .T
    .rename(columns={0: 'prob(done after 1)', 1: 'mean exceedance interval', 2: 'n'})
    .sort_values(by=['prob(done after 1)'], ascending=False)
)

stewart has no exceedances.


The following table shows the probability at each beach that an exceedance (i.e. period of time when the tests were all above the beach-closure threshold) will only last one day.

The probability is derived from the historical data:

$$ \text{probability} = \frac{\text{# times an exceedance lasted only 1 day}}{\text{# exceedances in total}} $$

We also report the value $\text{# exceedances in total}$ as the column `n` to get a sense of how trust worthy the probability is (the higher the `n`, the more accurate the estimated probability).

Lastly, we also show the average length of exceedances (in days) for each beach.

In [4]:
fast_test_df

Unnamed: 0,prob(done after 1),mean exceedance interval,n
bb_clarke,1.0,1.0,6.0
brittingham,1.0,1.0,4.0
james_madison,0.916667,1.416667,12.0
spring_harbor,0.875,1.375,16.0
warner,0.857143,1.571429,7.0
tenney,0.842105,1.789474,19.0
olin,0.809524,1.666667,21.0
bernies,0.777778,1.444444,9.0
mendota_co_park,0.777778,1.555556,9.0
goodland,0.764706,1.588235,17.0


From this, we can see that a good number of beaches (7 to be precise) have a 4 in 5 chance of having an exceedance last only one day.

One of the beaches, `stewart`, had no exceedances and thus no estimated historical probability.