### TODOs

- [ ] Increase charts size
- [ ] Make notes for the presentation
- [ ] Data analysis: simulate real entrance frequency
- [ ] Data analysis: plot time in a section
- [x] Package simulation as a class
- [ ] Plotly instead of pandas plots

# Project: Monte Carlo Markov Chain Simulation

## Business goals:  

1. understand customer behavior  
2. explain customer behavior to non-data staff  
3. optimize staffing so that the queues do not get unnecessary long  

## Supermarket Area

We are using the following model supermarket with six areas: entrance, fruit, spices, dairy, drinks and checkout.

The customers can move between these areas freely. Sooner or later, they will enter the checkout area. Once they do, they are considered to have left the shop.

![Drag Racing](./supermarket.png)

### Load data

In [None]:
import pandas as pd
import os

def load_day_data(day):
    return load_data(os.path.join('./data/', day + '.csv'))

def load_data(path):
    return pd.read_csv(path, sep=';', parse_dates=['timestamp'], index_col=['timestamp'])

df = load_day_data('monday')
for file in ['tuesday', 'wednesday', 'thursday', 'friday']:
    df_next = load_day_data(file)
    df_next['customer_no'] = df_next['customer_no'] + df['customer_no'].max()
    df = df.append(df_next)

# df

### Enrich data

In [None]:
def resample_transitions(df):
    df_full = df.groupby(by=['customer_no']).resample('1T').pad().drop(columns=['customer_no']).reset_index()
    df_full['location_before'] = df_full.groupby(by=['customer_no'])['location'].shift(fill_value='entrance')

    # When the shop closes, the remaining customers are rushed through the checkout. 
    # Their checkout is not recorded, so it may look as if they stay in the market forever.
    # Here we add last transition for such customers
    last_locations = df_full.groupby(by='customer_no')[['timestamp', 'location']].last()
    missing_checkouts = last_locations[last_locations['location'] != 'checkout'].copy()
    missing_checkouts['timestamp'] = missing_checkouts['timestamp'] + pd.Timedelta(minutes=1)
    missing_checkouts['location_before'] = missing_checkouts['location']
    missing_checkouts['location'] = 'checkout'
    missing_checkouts.reset_index(inplace=True)

    return df_full.append(missing_checkouts)

df_full = resample_transitions(df)
# df_full

## Data analysis

### When customers enter the supermarket?

In [None]:
def show_enter_time_distribution(df):
    """ plots daily avg distribution customers entering the store per hour """
    
    tmp = df.reset_index()
    days_cnt = len(df.index.day.unique())
    tmp['counter'] = 1 / days_cnt
    tmp.groupby([tmp['timestamp'].dt.hour])[['counter']].sum().plot.bar()
    
show_enter_time_distribution(df)

### How long users spend in the supermarket?

In [None]:
def show_time_in_supermarket_distribution(df):
    g = df.reset_index().groupby(['customer_no'])[['timestamp']]
    time_in_market = g.last() - g.first()
    time_in_market['counter'] = 1
    time_in_market.groupby(['timestamp']).count().plot()

show_time_in_supermarket_distribution(df_full)

### Probabilities plot

what are the chances to be in a specific section?

### How many sections customer visited before leaving the supermarket?

### Revenue Estimate

## Marov chain

### Transitions matrix

In [None]:
crosstab = pd.crosstab(df_full['location_before'], df_full['location'], normalize=0)
values = dict(zip(crosstab.columns, [0] * len(crosstab.columns)))
crosstab = crosstab.reindex(sorted(crosstab.columns), axis=1)
crosstab = crosstab.reindex(sorted(crosstab.index), axis=0)
crosstab.to_csv('./output/transition_matrix.csv', sep=';')
crosstab

### Monte carlo simulations

#### Simulation 1

In [None]:
from mcmc_simulator import McmcSimulator

In [None]:
matrix=pd.read_csv('./output/transition_matrix.csv', index_col=0, sep=';')

simulation1_file = './output/simulation-1.csv'

McmcSimulator.run(
    matrix,
    output_file=simulation1_file,
    entrance_distribution=lambda clock: 1
)

In [None]:
df_s1 = load_data(simulation1_file)
df_s1_full = resample_transitions(df_s1)

In [None]:
show_enter_time_distribution(df_s1)
# df_s1

In [None]:
show_time_in_supermarket_distribution(df_s1_full)

#### Simulation 2

In [None]:
simulation2_file = './output/simulation-2.csv'

McmcSimulator.run(
    matrix,
    output_file=simulation2_file,
    entrance_distribution=lambda clock: 2
)

In [None]:
df_s2 = load_data(simulation2_file)
df_s2_full = resample_transitions(df_s2)

In [None]:
show_enter_time_distribution(df_s2)

In [None]:
show_time_in_supermarket_distribution(df_s2_full)

#### Simulation 3