# Project: Markov Simulation

## Business goals:  

1. understand customer behavior  
2. explain customer behavior to non-data staff  
3. optimize staffing so that the queues do not get unnecessary long  

## Supermarket Area

We are using the following model supermarket with six areas: entrance, fruit, spices, dairy, drinks and checkout.

The customers can move between these areas freely. Sooner or later, they will enter the checkout area. Once they do, they are considered to have left the shop.

![Drag Racing](./project/supermarket.png)

## 8.1. Data Analysis

### Load data

In [None]:
import pandas as pd

In [None]:
from os import listdir
from os.path import isfile, join

def load_data(i, path, files):
    return pd.read_csv(os.path.join(path, files[i]), sep=';', parse_dates=['timestamp'])

path = './project/data/'

files = [f for f in listdir(path) if isfile(join(path, f))]

# load first file
df = load_data(0, path, files)

# join data from all remaining files
for i in range(1, len(files)):
    
    df_next = load_data(i, path, files)
    
    # change customer_no in order to keep customer uniqueness
    df_next['customer_no'] = df_next['customer_no'] + len(df)

    df = df.append(df_next)

df

In [None]:
# the total number of customers in each section (no unique customers)
df.groupby(by='location')['customer_no'].count()

## Fill out missing counter time

In [None]:
# When the shop closes, the remaining customers are rushed through the checkout. 
# Their checkout is not recorded, so it may look as if they stay in the market forever.

In [None]:
# df.iloc[25]['location']

# def get_last_id(df):
#     return df.index[-1]

# df[:100].groupby('customer_no').agg({'location': get_last_id})

In [None]:
# df.iloc[0:57]
df[df['customer_no'] == 6]

In [None]:
# # df.pivot(columns=['location'], values=['timestamp'], index=['customer_no'])

# def get_last(df):
#     print(df)
#     return df.index[[0, -1]]

# df.iloc[0:56].pivot(index=['customer_no'], columns=['location'], values=['timestamp']) # df.iloc[0:57] will fail
# df.iloc[0:57].groupby(['customer_no', 'location']).agg({'location': get_last})
# df.iloc[0:57]

In [None]:
# Calculate the total number of customers in each section over time

# Display the number of customers at checkout over time

In [None]:
# The time each customer spent in the market
visits = df.groupby(by='customer_no')['timestamp'].agg(['min', 'max'])
visits['duration'] = visits['max'] - visits['min']
visits.sort_values(by='duration', ascending=False)

In [None]:
# Calculate the total number of customers in the supermarket over time.

In [None]:
# # Our business managers think that the first section customers visit follows a different pattern than the following ones. Plot the distribution of customers of their first visited section versus following sections (treat all sections visited after the first as “following”).

# df.groupby(['customer_no']).agg({'location': [' -> '.join, 'count']})

# df.groupby(['customer_no'])['location'].describe().sort_values(by='freq', ascending=False)

In [None]:
# df.groupby(['customer_no'])['timestamp'].describe()
# # .sort_values(by='freq', ascending=False)

In [None]:
df[df['customer_no'] == 19854]

### Revenue Estimate

Estimate the total revenue for a customer using the following table:

| section | revenue per minute |
|---------|:--------------------:|
| fruit   | 4€                 |
| spices | 3€|
| dairy | 5€ |
| drinks | 6€ |

Which is the most profitable section according to your data?