# Depth of snow cover in Kaisaniemi Helsinki

This notebook investigates the depth of snow in Kaisaniemi Helsinki over last 60 years. Especially we look in to the probability of snow depth being more than 0 cm on a single day each year.



The data is originally from FMI: https://ilmatieteenlaitos.fi/havaintojen-lataus#!/

In [0]:
import pandas as pd
import pystan
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.special as ss

%matplotlib inline

In [0]:
# Load and cleanup data
# ovewrite original column names
# parse dates from several columns
# clean up negative snow depths
# About snow measurement in finnish: 
# Tilastoitava lumensyvyys mitataan aamulla klo 8 paikallista aikaa (kesäaikaan klo 9). Arvo -1 = ei lunta. Arvo 0 = havaintoasemalla ei ole lunta, mutta sen ympäristössä aukealla on.
df = (pd.read_csv("https://raw.githubusercontent.com/dins/snow-depth/master/kaisaniemi.csv", 
                 names=['year', 'month', 'day', 'clock', 'tzone', 'snow', 'temp'],
                 header=0)
                .assign(date = lambda d: pd.to_datetime(d[['year', 'month', 'day']]),
                        snow = lambda d: d['snow'].clip(0),
                        is_snow = lambda d: d['snow'] > 0)
                  [['date', 'snow', 'is_snow', 'temp']])

In [0]:
df.tail()

In [0]:
# Look into a specific day of the year
# Remove years with out snow depth measurement
christmas = df.loc[lambda d: ~d['snow'].isnull() & (d['date'].dt.day == 24) & (d['date'].dt.month == 12)]

In [0]:
christmas.head()

In [0]:
sns.scatterplot(x="date", y="snow", hue="is_snow", data=christmas)

In [0]:
stan_data = christmas.assign(decade=lambda d: (d['date'].dt.year - 2000) / 10,
                             is_snow=lambda d: d['is_snow'].astype(int))[['decade', 'is_snow']]

Logit is a function that creates a map of probability values from [0,1] to (-infinity, infinity): https://en.wikipedia.org/wiki/Logit

We use it here to transform probability values to real values.

In [0]:
first_model_code = '''
data {
   int N;
   int<lower=0, upper=1> is_snow[N];
}
parameters {
   real b;
}
model {
  for (i in 1:N) {
    is_snow[i] ~ bernoulli_logit(b);
  }
}
'''

In [0]:
model = pystan.StanModel(model_code=first_model_code)
fit = model.sampling(data={'N': len(stan_data), **stan_data.to_dict(orient='list')}, iter=1000, chains=4)
fit

The posterior distribution summarizes what you know after the data has been observed.

In [0]:
b_posterior = fit.extract('b')['b'] # Extract posterior draws for the parameter b. These sample values represent our posterior distribution.
sns.distplot(b_posterior) # Looks like normally distributed

Let's undo the logit transformation i.e. convert b values from real scale to probability scale. 

In [0]:
p_posterior = ss.expit(b_posterior) # inverse logit
sns.distplot(p_posterior)
plt.axvline(np.mean(p_posterior), 0, linestyle="--")

In [0]:
second_model_code = '''
data {
  int N;
  int<lower=0, upper=1> is_snow[N];
  real decade[N]; 
}
parameters {
  real b;
  real k; 
}
model {
  for (i in 1:N) {
    is_snow[i] ~ bernoulli_logit(k * decade[i] + b);
  }
}
generated quantities {
  real prob[N];
  for (i in 1:N) {
    prob[i] = inv_logit(k* decade[i] + b);
  }
}
'''

In [0]:
model2 = pystan.StanModel(model_code=second_model_code)
fit2 = model2.sampling(data={'N': len(stan_data), **stan_data.to_dict(orient='list')}, iter=1000, chains=4)
fit2

In [0]:
# Has snow cover decreased significantly?
params = fit2.extract(['prob[1]', 'prob[60]'])
diff_samples = params['prob[1]'] - params['prob[60]']
# probability of decrease
np.mean(diff_samples > 0)

In [0]:
# Now estimate snow propability for 2019
decade_2019 = (2019 - 2000) / 10
post_draws = fit2.extract(['b', 'k'])
predictions = ss.expit(post_draws['b'] + post_draws['k'] * decade_2019)

In [0]:
np.mean(predictions)

In [0]:
sns.distplot(predictions)