### Objective

**This is a simulation of customer transition year-to-year by state of engagement based on the following buckets**:

- **Active** *(More than 3 transactions in the last 2 years with at least one transaction in each year)*
- **Occassional** *(3 or fewer transactions in the last 2 years with at least one transaction in each year)*
- **Dormant** *(At least one transaction two years ago with no transactions last year)*
- **Lapsed** *(No transactions in either of the last two years)*


The expectation is to create a basis/status quo on which a Hidden Markov Model can be built in the near future (i.e. Simple Markov Chain) to better model customer transition from one state of engagement to another year-over-year.

[Credit for Code](#credit)<br>

In [1]:
from __future__ import division, print_function, absolute_import

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report, multilabel_confusion_matrix

import re
import logging
import warnings

pd.set_option('display.float_format', lambda x: '%.3f' % x)

%config Application.log_level = "ERROR"

warnings.filterwarnings(action='once')

def snakify(column_name):
    s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', column_name)
    return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()

In [2]:
class CustomerXsition(object):
    def __init__(self, transition_matrix, states):
        '''
        Initializes the Markov Chain.

        Parameters
        ----------
        transition matrix: 4-d array
            Matrix representing the probabilities of changing from one state to another.
        states: 1-d array
            Array of the states in same order as transition matrix
        '''
        self.transition_matrix = np.atleast_2d(transition_matrix)
        self.states = states
        self.index_dict = {
            self.states[index]: index
            for index in range(len(self.states))
        }
        self.state_dict = {
            index: self.states[index]
            for index in range(len(self.states))
        }

    def next_state(self, current_state):
        '''
        Simulates the next state based on the current state and the probability of transitioning to
        other states.

        Parameters
        ----------
        current_state: str
            The customer's current state.
        '''
        return np.random.choice(
            self.states,
            p=self.transition_matrix[self.index_dict[current_state], :])

In [3]:
transition_matrix = [[0.767, 0.082, 0.151, 0], [0.179, 0.291, 0.530, 0],
                     [0.039, 0.317, 0, 0.644], [0.004, 0.083, 0, 0.913]]

states = ['active', 'occassional', 'dormant', 'lapsed']

xsition = CustomerXsition(transition_matrix, states)

In [4]:
for state in states:
    print('The current state is:', state)
    for i in range(10):
        next_state = xsition.next_state(state)
        print('The next state is', next_state)
        state = next_state

    print('\n')

The current state is: active
The next state is active
The next state is dormant
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is occassional


The current state is: occassional
The next state is dormant
The next state is active
The next state is active
The next state is active
The next state is active
The next state is active
The next state is active
The next state is active
The next state is active
The next state is occassional


The current state is: dormant
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is lapsed
The next state is occassional


The current state is: lapsed
The next state is occassional
The next state is dormant
The next state is lapsed
The next state is lapsed

In [5]:
data = pd.read_csv('xsition.csv')
data.columns = [snakify(col) for col in data.columns]
data.head(10)

Unnamed: 0,address_id,freq_2016,freq_2017,freq_2018,freq_2019,state_2017,state_2018,state_2019
0,3000049708318,,,,,lapsed,lapsed,lapsed
1,3000056367699,,,,,lapsed,lapsed,lapsed
2,3000068725008,,,,,lapsed,lapsed,lapsed
3,3000396727694,1.0,,,1.0,dormant,lapsed,occassional
4,3000148387504,,,,,lapsed,lapsed,lapsed
5,3000138788077,,2.0,,,occassional,dormant,lapsed
6,3000081144958,,,,,lapsed,lapsed,lapsed
7,3000102778408,,,,,lapsed,lapsed,lapsed
8,3000083699077,2.0,1.0,4.0,1.0,occassional,active,active
9,3000014225696,19.0,15.0,13.0,5.0,active,active,active


In [6]:
class CustomerXsition_v2(object):
    def __init__(self, data, col1, col2):
        '''
        Initialize the Markov process using the underlying data passed via pandas DataFrame.
        The assumption is that each possible state occurs in col1 at the very least.

        Parameters
        ----------
        data : pandas DataFrame
            dataframe holding customer data.
        col1 : str
            name of column holding current customer state.
        col2 : str
            name of column holding next customer state.
        '''
        self.data = data
        self.col1 = col1
        self.col2 = col2
        self.states = list(self.data[self.col1].unique())

        self.index_dict = {
            self.states[index]: index
            for index in range(len(self.states))
        }
        self.state_dict = {
            index: self.states[index]
            for index in range(len(self.states))
        }

    def make_matrix(self, transition_matrix=None):
        '''
        Create the transition matrix from the underlying data or simply pass it if known.

        Parameters
        ----------
        transition_matrix: n-D array where n is the number of possible states
        '''
        if transition_matrix:
            self.transition_matrix = transition_matrix
        else:
            self.transition_matrix = np.asarray([[
                len(self.data[(self.data[self.col1] == self.state_dict[i])
                              & (self.data[self.col2] == self.state_dict[j])])
                / len(self.data[self.data[self.col1] == self.state_dict[i]])
                for j in range(len(self.states))
            ] for i in range(len(self.states))])

        return self.transition_matrix

    def next_state(self, current_state):
        '''
        Simulates the next state based on the current state and the probability of transitioning to
        other states.

        Parameters
        ----------
        current_state: str
            The customer's current state.
        '''
        return np.random.choice(
            self.states,
            p=self.transition_matrix[self.index_dict[current_state], :])

    def generate_states(self, current_state, n=10):
        '''
        Simulates the next n states based on the current state.

        Parameters
        ----------
        current_state: str
             The customer's current state.
        n: int
             The number of steps into the future to simulate.
        '''
        future_states = []
        for i in range(n):
            next_state = self.next_state(current_state)
            future_states.append(next_state)
            current_state = next_state

        return future_states

In [7]:
cXsition = CustomerXsition_v2(data, 'state_2017', 'state_2018')
mx = cXsition.make_matrix()

In [8]:
mx

array([[0.91300026, 0.        , 0.08265169, 0.00434805],
       [0.64400022, 0.        , 0.31709816, 0.03890162],
       [0.        , 0.53016917, 0.29035774, 0.17947309],
       [0.        , 0.15116895, 0.08205081, 0.76678024]])

In [9]:
cXsition.states

['lapsed', 'dormant', 'occassional', 'active']

In [10]:
for state in cXsition.states:
    print('The current state is:', state, '\nA simulated next state is:',
          cXsition.next_state(state))
    print('\n')

The current state is: lapsed 
A simulated next state is: lapsed


The current state is: dormant 
A simulated next state is: lapsed


The current state is: occassional 
A simulated next state is: active


The current state is: active 
A simulated next state is: active




In [11]:
# simulating a customer's state in 2019...

data['pred_state_2019'] = data['state_2018'].apply(cXsition.next_state)

In [12]:
data.head()

Unnamed: 0,address_id,freq_2016,freq_2017,freq_2018,freq_2019,state_2017,state_2018,state_2019,pred_state_2019
0,3000049708318,,,,,lapsed,lapsed,lapsed,lapsed
1,3000056367699,,,,,lapsed,lapsed,lapsed,occassional
2,3000068725008,,,,,lapsed,lapsed,lapsed,lapsed
3,3000396727694,1.0,,,1.0,dormant,lapsed,occassional,lapsed
4,3000148387504,,,,,lapsed,lapsed,lapsed,lapsed


In [13]:
data['state_2019'].value_counts()

lapsed         6511953
occassional    1270240
active         1268657
dormant         949150
Name: state_2019, dtype: int64

In [14]:
data['pred_state_2019'].value_counts()

lapsed         6489038
active         1341965
occassional    1288692
dormant         880305
Name: pred_state_2019, dtype: int64

In [15]:
print(states)
confusion_matrix(data['state_2019'].values,
                 data['pred_state_2019'].values,
                 labels=states)

['active', 'occassional', 'dormant', 'lapsed']


array([[ 810407,  154223,  261377,   42650],
       [ 168609,  239197,  208594,  653840],
       [ 315199,  223617,  410334,       0],
       [  47750,  671655,       0, 5792548]], dtype=int64)

In [16]:
print(
    classification_report(data['state_2019'].values,
                          data['pred_state_2019'].values,
                          labels=states,
                          target_names=states))

              precision    recall  f1-score   support

      active       0.60      0.64      0.62   1268657
 occassional       0.19      0.19      0.19   1270240
     dormant       0.47      0.43      0.45    949150
      lapsed       0.89      0.89      0.89   6511953

    accuracy                           0.73  10000000
   macro avg       0.54      0.54      0.54  10000000
weighted avg       0.73      0.73      0.73  10000000



In [17]:
multilabel_confusion_matrix(data['state_2019'],
                            data['pred_state_2019'],
                            labels=states)

array([[[8199785,  531558],
        [ 458250,  810407]],

       [[7680265, 1049495],
        [1031043,  239197]],

       [[8580879,  469971],
        [ 538816,  410334]],

       [[2791557,  696490],
        [ 719405, 5792548]]], dtype=int64)

In [18]:
# checking how close the numbers predicted are to actual...

for state in states:
    print(
        state,
        ':',
        len(data[data['pred_state_2019'] == state]) /
        len(data[data['state_2019'] == state]))

active : 1.0577839400247664
occassional : 1.0145263887139437
dormant : 0.9274666807143234
lapsed : 0.9964810863960474


<a id='credit'></a>

Credit for code:<br>
**Alessandro Molina on Medium (Markov Chains with Python)** <br>
https://medium.com/@__amol__/markov-chains-with-python-1109663f3678