### Objective

**This is a simulation of customer transition year-to-year by state of engagement based on the following buckets**:

- **Active** *(More than 3 transactions in the last 2 years with at least one transaction in each year)*
- **Occassional** *(3 or fewer transactions in the last 2 years with at least one transaction in each year)*
- **Dormant** *(At least one transaction two years ago with no transactions last year)*
- **Lapsed** *(No transactions in either of the last two years)*


The expectation is to create a basis/status quo on which a Hidden Markov Model can be built in the near future (i.e. Simple Markov Chain) to better model customer transition from one state of engagement to another year-over-year.

[Creating the Function](#fxn) <br>
[Monte Carlo Simulation](#monte)<br>
[Credit for Code](#credit)<br>

In [1]:
from __future__ import division, print_function, absolute_import

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report, multilabel_confusion_matrix

import re
import logging
import warnings
import time

pd.set_option('display.float_format', lambda x: '%.3f' % x)

%config Application.log_level = "ERROR"

warnings.filterwarnings(action='once')

def snakify(column_name):
    s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', column_name)
    return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()

In [2]:
class CustomerXsition(object):
    def __init__(self, transition_matrix, states):
        """
        Initializes the Markov Chain.

        Parameters
        ----------
        transition matrix: 4-d array
            Matrix representing the probabilities of changing from one state to another.
        states: 1-d array
            Array of the states in same order as transition matrix
        """
        self.transition_matrix = np.atleast_2d(transition_matrix)
        self.states = states
        self.index_dict = {
            self.states[index]: index
            for index in range(len(self.states))
        }
        self.state_dict = {
            index: self.states[index]
            for index in range(len(self.states))
        }

    def next_state(self, current_state):
        """
        Simulates the next state based on the current state and the probability of transitioning to
        other states.

        Parameters
        ----------
        current_state: str
            The customer's current state.
        """
        return np.random.choice(
            self.states,
            p=self.transition_matrix[self.index_dict[current_state], :])

In [3]:
transition_matrix = [[0.767, 0.082, 0.151, 0], [0.179, 0.291, 0.530, 0],
                     [0.039, 0.317, 0, 0.644], [0.004, 0.083, 0, 0.913]]

states = ['active', 'occassional', 'dormant', 'lapsed']

xsition = CustomerXsition(transition_matrix, states)

In [4]:
for state in states:
    print('The current state is:', state)
    for i in range(10):
        next_state = xsition.next_state(state)
        print('The next state is', next_state)
        state = next_state

    print('\n')

The current state is: active
The next state is active
The next state is active
The next state is active
The next state is active
The next state is active
The next state is active
The next state is active
The next state is active
The next state is active
The next state is active


The current state is: occassional
The next state is occassional
The next state is dormant
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed


The current state is: dormant
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is occassional
The next state is active
The next state is dormant
The next state is lapsed


The current state is: lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next s

In [5]:
data = pd.read_csv('xsition.csv')
data.columns = [snakify(col) for col in data.columns]
data.head(10)

Unnamed: 0,address_id,freq_2016,freq_2017,freq_2018,freq_2019,state_2017,state_2018,state_2019
0,3000049708318,,,,,lapsed,lapsed,lapsed
1,3000056367699,,,,,lapsed,lapsed,lapsed
2,3000068725008,,,,,lapsed,lapsed,lapsed
3,3000396727694,1.0,,,1.0,dormant,lapsed,occassional
4,3000148387504,,,,,lapsed,lapsed,lapsed
5,3000138788077,,2.0,,,occassional,dormant,lapsed
6,3000081144958,,,,,lapsed,lapsed,lapsed
7,3000102778408,,,,,lapsed,lapsed,lapsed
8,3000083699077,2.0,1.0,4.0,1.0,occassional,active,active
9,3000014225696,19.0,15.0,13.0,5.0,active,active,active


<a id='fxn'></a>

In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000000 entries, 0 to 9999999
Data columns (total 8 columns):
address_id    int64
freq_2016     float64
freq_2017     float64
freq_2018     float64
freq_2019     float64
state_2017    object
state_2018    object
state_2019    object
dtypes: float64(4), int64(1), object(3)
memory usage: 610.4+ MB


In [7]:
class CustomerXsition_v2(object):
    """
    Class to create a Markov transition matrix from a pandas DataFrame given the column
    that holds the observed states and make forecasts on future states based on empirical probailities
    on transition matrix.

    Monte Carlo simulations can be run using randomly generated sample data. 
    """

    def __init__(self, data, col1, col2):
        """
        Initialize the Markov Chain process using the underlying pandas DataFrame with
        the assumption that each possible state occurs in col1.

        Parameters
        ----------
        data : pandas DataFrame
            dataframe holding states in a column.
        col1 : str
            name of column holding current customer state.
        col2 : str
            name of column holding next customer state.
        """
        self.data = data
        self.col1 = col1
        self.col2 = col2
        self.states = list(self.data[self.col1].unique())

        self.index_dict = {
            self.states[index]: index
            for index in range(len(self.states))
        }
        self.state_dict = {
            index: self.states[index]
            for index in range(len(self.states))
        }

    def make_matrix(self, transition_matrix=None):
        """
        Create the transition matrix from the underlying data or simply pass it if known.

        Parameters
        ----------
        transition_matrix: n-D array where n is the number of possible states
        """
        if transition_matrix:
            self.transition_matrix = transition_matrix
        else:
            self.transition_matrix = np.asarray([[
                len(self.data[(self.data[self.col1] == self.state_dict[i])
                              & (self.data[self.col2] == self.state_dict[j])])
                / len(self.data[self.data[self.col1] == self.state_dict[i]])
                for j in range(len(self.states))
            ] for i in range(len(self.states))])

        return self.transition_matrix

    def next_state(self, current_state):
        """
        Simulates the next state based on the current state and the probability of transitioning to
        other states.

        Parameters
        ----------
        current_state: str
            The customer's current state.
        """
        return np.random.choice(
            self.states,
            p=self.transition_matrix[self.index_dict[current_state], :])

    def generate_states(self, current_state, n=10):
        """
        Simulates the next n states based on the current state.

        Parameters
        ----------
        current_state: str
             The customer's current state.
        n: int
             The number of steps into the future to simulate.
        """
        future_states = []
        for i in range(n):
            next_state = self.next_state(current_state)
            future_states.append(next_state)
            current_state = next_state

        return future_states

    def monte_carlo_sim(self, samp_size=None, samp_n=0.05, n_sim=1000000):
        """
        Performs a Monte Carlo simulation using randomly generated samples of the data n_sim times.

        Parameters
        ----------
        samp_size : int
            size of desired sample
        samp_n : float
            percentage of data desired to create sample.
            Used by default when samp_size is not explicitly passed.
        n_sim : int
            no. of loops/simulations.
            Default is 1 million.
        """
        self.nested_dict = {
            self.states[index]:
            {self.states[index]: []
             for index in range(len(self.states))}
            for index in range(len(self.states))
        }

        for i in range(n_sim):
            if samp_size:
                self.sample = self.data.sample(samp_size)
            else:
                self.sample = self.data.sample(int(samp_n * len(self.data)))

            f_states = {
                state: [
                    self.next_state(state) for i in range(
                        len(self.sample[self.sample[column] == state]))
                ]
                for state in self.states
            }
            [[
                self.nested_dict[s][state].append(f_states[s].count(state))
                for state in self.states
            ] for s in self.states]

        for s in self.states:
            for state in self.states:
                print(
                    s, 'to', state + ':',
                    round(
                        np.mean(self.nested_dict[s][state]) /
                        len(self.sample[self.sample[column] == s]) * 100, 2),
                    '% chance')

In [8]:
print(CustomerXsition_v2.__doc__)


    Class to create a Markov transition matrix from a pandas DataFrame given the column
    that holds the observed states and make forecasts on future states based on empirical probailities
    on transition matrix.

    Monte Carlo simulations can be run using randomly generated sample data. 
    


In [9]:
print(CustomerXsition_v2.__init__.__doc__)


        Initialize the Markov Chain process using the underlying pandas DataFrame with
        the assumption that each possible state occurs in col1.

        Parameters
        ----------
        data : pandas DataFrame
            dataframe holding states in a column.
        col1 : str
            name of column holding current customer state.
        col2 : str
            name of column holding next customer state.
        


In [10]:
cXsition = CustomerXsition_v2(data, 'state_2017', 'state_2018')
mx = cXsition.make_matrix()

In [11]:
mx

array([[0.91300026, 0.        , 0.08265169, 0.00434805],
       [0.64400022, 0.        , 0.31709816, 0.03890162],
       [0.        , 0.53016917, 0.29035774, 0.17947309],
       [0.        , 0.15116895, 0.08205081, 0.76678024]])

In [12]:
cXsition.states

['lapsed', 'dormant', 'occassional', 'active']

In [13]:
for state in cXsition.states:
    print('The current state is:', state, '\nA simulated next state is:',
          cXsition.next_state(state))
    print('\n')

The current state is: lapsed 
A simulated next state is: occassional


The current state is: dormant 
A simulated next state is: lapsed


The current state is: occassional 
A simulated next state is: occassional


The current state is: active 
A simulated next state is: active




In [14]:
cXsition_2 = CustomerXsition_v2(data, 'state_2018', 'state_2019')
mx_2 = cXsition_2.make_matrix()

In [15]:
mx_2

array([[0.91322847, 0.        , 0.00417906, 0.08259247],
       [0.66953397, 0.        , 0.0323612 , 0.29810483],
       [0.        , 0.17933841, 0.73381535, 0.08684624],
       [0.        , 0.55338016, 0.16324041, 0.28337942]])

In [16]:
cXsition_2.states

['lapsed', 'dormant', 'active', 'occassional']

In [17]:
# simulating a customer's state in 2019...

data['pred_state_2019'] = data['state_2018'].apply(cXsition.next_state)

In [18]:
data.head()

Unnamed: 0,address_id,freq_2016,freq_2017,freq_2018,freq_2019,state_2017,state_2018,state_2019,pred_state_2019
0,3000049708318,,,,,lapsed,lapsed,lapsed,lapsed
1,3000056367699,,,,,lapsed,lapsed,lapsed,lapsed
2,3000068725008,,,,,lapsed,lapsed,lapsed,lapsed
3,3000396727694,1.0,,,1.0,dormant,lapsed,occassional,lapsed
4,3000148387504,,,,,lapsed,lapsed,lapsed,lapsed


In [19]:
data['state_2019'].value_counts()

lapsed         6511953
occassional    1270240
active         1268657
dormant         949150
Name: state_2019, dtype: int64

In [20]:
data['pred_state_2019'].value_counts()

lapsed         6490321
active         1342167
occassional    1286367
dormant         881145
Name: pred_state_2019, dtype: int64

In [21]:
print(states)
confusion_matrix(data['state_2019'].values,
                 data['pred_state_2019'].values,
                 labels=states)

['active', 'occassional', 'dormant', 'lapsed']


array([[ 810359,  153621,  262303,   42374],
       [ 168591,  238375,  208946,  654328],
       [ 315358,  223896,  409896,       0],
       [  47859,  670475,       0, 5793619]], dtype=int64)

In [22]:
multilabel_confusion_matrix(data['state_2019'],
                            data['pred_state_2019'],
                            labels=states)

array([[[8199535,  531808],
        [ 458298,  810359]],

       [[7681768, 1047992],
        [1031865,  238375]],

       [[8579601,  471249],
        [ 539254,  409896]],

       [[2791345,  696702],
        [ 718334, 5793619]]], dtype=int64)

In [24]:
print(
    classification_report(data['state_2019'].values,
                          data['pred_state_2019'].values,
                          labels=states,
                          target_names=states))

              precision    recall  f1-score   support

      active       0.60      0.64      0.62   1268657
 occassional       0.19      0.19      0.19   1270240
     dormant       0.47      0.43      0.45    949150
      lapsed       0.89      0.89      0.89   6511953

    accuracy                           0.73  10000000
   macro avg       0.54      0.54      0.54  10000000
weighted avg       0.73      0.73      0.73  10000000



<a id='monte'></a>

### Monte Carlo Simulation

In [61]:
def monte_sim(data, column, samp_size=None, samp_n=0.1, loops=100000):
    states = list(data[column].unique())

    nested_dict = {
        states[index]: {states[index]: []
                        for index in range(len(states))}
        for index in range(len(states))
    }

    for i in range(loops):
        if samp_size:
            sample = data.sample(samp_size)
        else:
            sample = data.sample(int(samp_n * len(data)))

        f_states = {
            state: [
                cXsition.next_state(state)
                for i in range(len(sample[sample[column] == state]))
            ]
            for state in states
        }
        [[
            nested_dict[s][state].append(f_states[s].count(state))
            for state in states
        ] for s in states]

    for s in states:
        for state in states:
            print(s, 'to', state + ':',
                  round(np.mean(nested_dict[s][state]) / len(sample[sample[column] == s]) * 100, 4),
                  '% chance')

In [62]:
monte_sim(data=data, column='state_2017', samp_n=0.05, loops=1000000)

lapsed to lapsed: 91.3675 % chance
lapsed to dormant: 0.0 % chance
lapsed to occassional: 8.3084 % chance
lapsed to active: 0.4295 % chance
dormant to lapsed: 63.3775 % chance
dormant to dormant: 0.0 % chance
dormant to occassional: 31.3723 % chance
dormant to active: 3.8966 % chance
occassional to lapsed: 0.0 % chance
occassional to dormant: 53.7415 % chance
occassional to occassional: 29.1366 % chance
occassional to active: 18.2138 % chance
active to lapsed: 0.0 % chance
active to dormant: 15.0188 % chance
active to occassional: 8.1206 % chance
active to active: 76.2912 % chance


<a id='credit'></a>

Credit for code:<br>
**Alessandro Molina on Medium (Markov Chains with Python)** <br>
https://medium.com/@__amol__/markov-chains-with-python-1109663f3678