# Bayesian Changepoint Model
* Do the sales numbers change somewhere over that range of time? i.e. are sales increasing in frequency, or decreasing and at which point in time can this change be identified with statistical significance?
* I implemented this basic example following the great book "Bayesian Methods for Hackers"

In [None]:
import pandas as pd
import pymc3 as pm
import arviz as az
import numpy as np

In [None]:
 import theano

In [None]:
theano.__version__

In [None]:
df = pd.read_csv("../input/iowa-liquor-sales/Iowa_Liquor_Sales.csv")

In [None]:
df_part = df[df.City=="IOWA CITY"]
df_part.loc[:,"Purchasedate"] = pd.to_datetime(df_part.Date, format="%m/%d/%Y")
df_part = df_part.sort_values(by="Purchasedate")
df_part = df_part[df_part.Purchasedate.dt.year == 2014 ]

In [None]:
df_part

In [None]:
weekly = df_part.groupby(df_part.Purchasedate.dt.weekofyear)["Bottles Sold"].agg("sum")
weekly.plot(kind="bar", figsize=(12,4), title="Bottles saled per week");

Goal is to estimate the rate at which sales occur before a change point (mu) and the rate after this shift (lambda).

To achieve this, MCMC sampling is applied to estimate the population parameters at each possible change point (tau), from beginning of the dataset to the end.



In [None]:
alpha = 1/weekly.mean()

with pm.Model() as model:

  lambda_before = pm.Exponential("lambda1", alpha)
  lambda_after = pm.Exponential("lambda2", alpha)

  tau = pm.DiscreteUniform("tau", lower=0, upper=len(weekly))

  idx = np.arange(0, len(weekly)) 
  lambda_ = pm.math.switch(tau > idx, lambda_before, lambda_after)

  observations = pm.Poisson("obs", lambda_, observed=weekly.values)

  step = pm.Metropolis()
  trace = pm.sample(15000, tune=5000, step=step)

  lambda_1_samples = trace['lambda1']
  lambda_2_samples = trace['lambda2']
  tau_samples = trace['tau']

  az.plot_trace(trace, legend=True);

It is pretty obv that the switchpoint could be found in week 33