***`Business Context`***,

In today's highly competitive market, maximizing **customer loyalty** is a top priority for businesses. Understanding the key **drivers** that hold the most value for customers is crucial in developing effective strategies to enhance loyalty.
<br><br>
To gain **insights** into customer loyalty, a comprehensive dataset comprising multiple surveys conducted among clients has been compiled. The primary objective is to identify the key **features and factors** that significantly influence customer loyalty.<br>
By analyzing this data, businesses can make informed decisions and tailor their offerings to better meet customer needs, ultimately fostering stronger loyalty and long-term customer relationships.

*packages*

In [38]:
# general
import psycopg2
import pandas as pd
from sqlalchemy import create_engine
import os, yaml, requests, random, time
import numpy as np
from math import pi
from IPython.display import Image
import pickle

# shap
import shap 

# DiCE
import dice_ml
from dice_ml.utils import helpers

# optimization
import pyomo.environ as pyo
from pyomo.opt import SolverFactory

# scikitlearn
from sklearn.model_selection import train_test_split
from sklearn import svm, metrics, decomposition
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import ParameterGrid
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import precision_score, recall_score
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import cross_val_score
from sklearn.metrics import roc_curve
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import *

# To plot pretty figures
%matplotlib inline
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)
plt.style.use('ggplot')

***`load the model`***

In [2]:
with open('model.pkl', 'rb') as file:
    model = pickle.load(file)

In [3]:
model

In [4]:
print('estimator: ', model.estimator_)
print('classes: ', model.classes_)
print('features: ', model.feature_names_in_)
print('feature importances: ', model.feature_importances_)

estimator:  DecisionTreeClassifier()
classes:  [0 1]
features:  ['productRate' 'priceRate' 'promoRate' 'ambianceRate' 'wifiRate'
 'serviceRate' 'chooseRate']
feature importances:  [0.16051834 0.29423861 0.1143797  0.12877026 0.06616064 0.05527729
 0.18065516]


*load the data*

`train`

In [5]:
X_train = pd.read_csv('X_train.csv').rename(columns = {'Unnamed: 0':'index'}).set_index('index')
y_train = pd.read_csv('y_train.csv').rename(columns = {'Unnamed: 0':'index'}).set_index('index')

`test`

In [6]:
X_test = pd.read_csv('X_test.csv').rename(columns = {'Unnamed: 0':'index'}).set_index('index')
y_test = pd.read_csv('y_test.csv').rename(columns = {'Unnamed: 0':'index'}).set_index('index')

#### problem optimization

*`sets:`*
   - drivers: product, price, promo, ambiance, wifi, service, choose
   - surveyed: 113
   
*`parameters:`*
   - price increase
   - loyalty
   - betas

*`output:`*
   - loyalty increase

In [7]:
feature_labels = model.feature_names_in_
feature_betas = model.feature_importances_

In [8]:
# model
model_pyo = pyo.ConcreteModel()

model_pyo.sDrivers = pyo.Set(initialize = feature_labels)
model_pyo.sSurveyed = pyo.Set(initialize = [i for i in range(X_train.shape[0])])

In [9]:
df_perception = X_train.reset_index(drop = True, inplace = False)

In [10]:
df_perception.head()

Unnamed: 0,productRate,priceRate,promoRate,ambianceRate,wifiRate,serviceRate,chooseRate
0,8,6,6,6,4,6,8
1,10,2,6,6,4,6,6
2,6,4,8,6,6,6,8
3,8,8,10,8,4,6,8
4,10,4,10,10,4,8,6


In [11]:
model_pyo.pBetas = pyo.Param(model_pyo.sDrivers, initialize = dict(zip(feature_labels, feature_betas)))

In [12]:
model_pyo.pBetas.display()

pBetas : Size=7, Index=sDrivers, Domain=Any, Default=None, Mutable=False
    Key          : Value
    ambianceRate : 0.12877025843524212
      chooseRate :  0.1806551645660694
       priceRate :  0.2942386120604277
     productRate : 0.16051833695000436
       promoRate : 0.11437969881373418
     serviceRate : 0.05527729008139536
        wifiRate : 0.06616063909312678


In [13]:
def load_perception(model, survey, driver):
    return df_perception[driver][survey]

In [14]:
model_pyo.pPerception = pyo.Param(model_pyo.sSurveyed, 
                                  model_pyo.sDrivers, 
                                  initialize = load_perception)

In [15]:
for i in model_pyo.pPerception.items():
    print(i)
    break

((0, 'productRate'), 8)


In our marketing initiatives, we are allocating a specific budget to enhance customer loyalty through targeted and strategic marketing programs. For the purpose of this example, we have set aside a budget of $2500 USD. Our aim is to drive significant improvements in loyalty metrics, with a maximum increase of 40 points for each loyalty driver. 
<br><br>
Ultimately, this could result in a cumulative increase of up to 210 points across all loyalty drivers. By efficiently utilizing this budget, we are determined to create impactful campaigns that cultivate stronger customer loyalty and foster lasting relationships with our brand.
<br><br><br>
The incremental cost required to improve each driver by 1 point is as follows:
 - *ambiance:* **100 usd**
 - *price:* **330 usd**
 - *product:* **310 usd**
 - *promo:* **160 usd**
 - *service:* **260 usd**
 - *wifi:* **100 usd**
 - *choose:* **460 usd**

In [16]:
model_pyo.pPriceIncrease = pyo.Param(model_pyo.sDrivers, 
                                 initialize = dict(zip(feature_labels, [310, 330, 160, 100, 265, 260, 460])))

In [17]:
model_pyo.pPriceIncrease.display()

pPriceIncrease : Size=7, Index=sDrivers, Domain=Any, Default=None, Mutable=False
    Key          : Value
    ambianceRate :   100
      chooseRate :   460
       priceRate :   330
     productRate :   310
       promoRate :   160
     serviceRate :   260
        wifiRate :   265


In [18]:
model_pyo.vLoyaltyIncrease = pyo.Var(model_pyo.sDrivers, 
                                     domain = pyo.NonNegativeIntegers,
                                     bounds = (0, 40))

In [19]:
model_pyo.vLoyaltyIncrease.display()

vLoyaltyIncrease : Size=7, Index=sDrivers
    Key          : Lower : Value : Upper : Fixed : Stale : Domain
    ambianceRate :     0 :  None :    40 : False :  True : NonNegativeIntegers
      chooseRate :     0 :  None :    40 : False :  True : NonNegativeIntegers
       priceRate :     0 :  None :    40 : False :  True : NonNegativeIntegers
     productRate :     0 :  None :    40 : False :  True : NonNegativeIntegers
       promoRate :     0 :  None :    40 : False :  True : NonNegativeIntegers
     serviceRate :     0 :  None :    40 : False :  True : NonNegativeIntegers
        wifiRate :     0 :  None :    40 : False :  True : NonNegativeIntegers


In [20]:
budget = 2500

model_pyo.ctBudget = pyo.Constraint(expr = pyo.quicksum(model_pyo.vLoyaltyIncrease[i]*model_pyo.pPriceIncrease[i]\
                                                        for i in model_pyo.sDrivers)<=budget)

In [21]:
model_pyo.obj = pyo.Objective(sense = pyo.maximize,
                              expr = pyo.quicksum((0.1*model_pyo.vLoyaltyIncrease[j] + model_pyo.pPerception[(i, j)])*model_pyo.pBetas[j]\
                                                 for i in model_pyo.sSurveyed for j in model_pyo.sDrivers))

In [22]:
opt = SolverFactory("glpk")

In [23]:
opt.solve(model_pyo)

{'Problem': [{'Name': 'unknown', 'Lower bound': 650.037269950889, 'Upper bound': 650.037269950889, 'Number of objectives': 1, 'Number of constraints': 1, 'Number of variables': 8, 'Number of nonzeros': 7, 'Sense': 'maximize'}], 'Solver': [{'Status': 'ok', 'Termination condition': 'optimal', 'Statistics': {'Branch and bound': {'Number of bounded subproblems': '1', 'Number of created subproblems': '1'}}, 'Error rc': 0, 'Time': 0.07029104232788086}], 'Solution': [OrderedDict([('number of solutions', 0), ('number of solutions displayed', 0)])]}

In [24]:
model_pyo.vLoyaltyIncrease.display()

vLoyaltyIncrease : Size=7, Index=sDrivers
    Key          : Lower : Value : Upper : Fixed : Stale : Domain
    ambianceRate :     0 :  25.0 :    40 : False : False : NonNegativeIntegers
      chooseRate :     0 :   0.0 :    40 : False : False : NonNegativeIntegers
       priceRate :     0 :   0.0 :    40 : False : False : NonNegativeIntegers
     productRate :     0 :   0.0 :    40 : False : False : NonNegativeIntegers
       promoRate :     0 :   0.0 :    40 : False : False : NonNegativeIntegers
     serviceRate :     0 :   0.0 :    40 : False : False : NonNegativeIntegers
        wifiRate :     0 :   0.0 :    40 : False : False : NonNegativeIntegers


In [25]:
model_pyo.ctBudget.display()

ctBudget : Size=1
    Key  : Lower : Body   : Upper
    None :  None : 2500.0 : 2500.0


Based on the results, the optimization suggests allocating approximately **60%** of the budget towards enhancing the ambiance, which includes aspects like music, lighting, and furniture.
<br><br>
Moreover, considering the **Shap values**, service emerges as the 4th most important factor for loyal clients. While it is crucial to prioritize the **product** and **price**, the Shap values indicate a negative trend with the target value. However, taking intuition into account, we **propose** allocating **62%** of the budget towards improving the **ambiance**, with **18%** dedicated to **price** adjustments and another **18%** allocated for **product** enhancements. 
<br>
This allocation strategy aims to strike a balance among critical factors, emphasizing the ambiance while ensuring the product and price are also adequately addressed to further boost customer loyalty.

In [26]:
print(feature_labels)

['productRate' 'priceRate' 'promoRate' 'ambianceRate' 'wifiRate'
 'serviceRate' 'chooseRate']


In [27]:
increments = [1, 1, 1, 2.5, 0, 0, 0]

In [28]:
y_test.reset_index(drop= True, inplace = True)

In [29]:
X_test.reset_index(drop= True, inplace = True)

In [30]:
X_inc = X_test + increments
X_inc.head()

Unnamed: 0,productRate,priceRate,promoRate,ambianceRate,wifiRate,serviceRate,chooseRate
0,9.0,7.0,9.0,10.5,6.0,8.0,8.0
1,7.0,7.0,9.0,6.5,4.0,6.0,6.0
2,11.0,9.0,11.0,12.5,6.0,10.0,10.0
3,7.0,5.0,9.0,10.5,8.0,10.0,8.0
4,9.0,3.0,9.0,12.5,6.0,6.0,8.0


In [31]:
X_test.head()

Unnamed: 0,productRate,priceRate,promoRate,ambianceRate,wifiRate,serviceRate,chooseRate
0,8,6,8,8,6,8,8
1,6,6,8,4,4,6,6
2,10,8,10,10,6,10,10
3,6,4,8,8,8,10,8
4,8,2,8,10,6,6,8


In [32]:
X_inc_adj = pd.DataFrame(data = np.where(X_inc > 10.0, 10.0, X_inc), columns = feature_labels)

In [33]:
pd.DataFrame(model.predict(X_inc_adj), columns = ['loyal']).value_counts()

loyal
0        22
1         1
dtype: int64

In [34]:
y_test.value_counts()

loyal
0        20
1         3
dtype: int64

Upon reviewing the results, it is evident that we have a high dependency on a single individual to create loyalty. This is concerning as a loss of just two clients would lead to a significant performance decline of 66%, which is not favorable.

One interesting observation is the peculiar relationship between price and loyalty. It seems that loyal clients may grade the price negatively, yet the model identifies that loyal clients are associated with higher prices. This warrants further investigation to understand the underlying dynamics.

Additionally, the limited amount of data is a notable factor. Having more data would enable the model to generalize more effectively and consider not only the current features but also additional variables. For instance, loyalty in a cafeteria environment may vary based on the location, and more data would help capture such nuances.

Overall, there are several aspects to consider, such as the relationship between price and loyalty, the need for more data, and the potential influence of other factors. A deeper analysis and additional data could lead to more accurate and robust results.

### What if

In [66]:
Image(url="what_if.webp", width=700, height=600)

In the previous section, we devised what we believed to be the best strategy for increasing loyalty. However, contrary to our initial expectations, the implementation of this solution resulted in a 66% decrease in loyalty, which is concerning.

Now, the critical question is: 
- What steps should we take to effectively boost loyalty?

To find an answer, we have adopted the approach of using [Diverse Counterfactual Explanations (DiCE)](https://medium.com/towards-data-science/dice-diverse-counterfactual-explanations-for-hotel-cancellations-762c311b2c64). This technique allows us to create various scenarios where regular customers transform into loyal clients. By exploring these different scenarios, we can gain valuable insights into the factors and strategies that can genuinely drive customer loyalty and inform our future decision-making process.

In [41]:
df_train = pd.concat([X_train, y_train], axis = 1)

In [44]:
df_train.head(1)

Unnamed: 0_level_0,productRate,priceRate,promoRate,ambianceRate,wifiRate,serviceRate,chooseRate,loyal
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
31,8,6,6,6,4,6,8,0


In [50]:
d = dice_ml.Data(dataframe = df_train, 
                 continuous_features = list(feature_labels),  
                 outcome_name='loyal')

In [52]:
m = dice_ml.Model(model=model, backend="sklearn")

exp = dice_ml.Dice(d, m, method="random")

In [54]:
X_test.iloc[0]

productRate     8
priceRate       6
promoRate       8
ambianceRate    8
wifiRate        6
serviceRate     8
chooseRate      8
Name: 0, dtype: int64

In [55]:
y_test.iloc[0]

loyal    0
Name: 0, dtype: int64

In [61]:
e1 = exp.generate_counterfactuals(X_test[0:3],  
                                  total_CFs=3, 
                                  features_to_vary=['productRate', 'priceRate', 'ambianceRate'],
                                  desired_class="opposite")

100%|██████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.13it/s]


In [67]:
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,productRate,priceRate,promoRate,ambianceRate,wifiRate,serviceRate,chooseRate,loyal
0,8,6,8,8,6,8,8,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,productRate,priceRate,promoRate,ambianceRate,wifiRate,serviceRate,chooseRate,loyal
0,3,3,-,-,-,-,-,1
1,3,4,-,-,-,-,-,1
2,4,5,-,-,-,-,-,1


Query instance (original outcome : 0)


Unnamed: 0,productRate,priceRate,promoRate,ambianceRate,wifiRate,serviceRate,chooseRate,loyal
0,6,6,8,4,4,6,6,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,productRate,priceRate,promoRate,ambianceRate,wifiRate,serviceRate,chooseRate,loyal
0,3,3,-,-,-,-,-,1
1,-,3,-,5,-,-,-,1
2,-,3,-,6,-,-,-,1


Query instance (original outcome : 0)


Unnamed: 0,productRate,priceRate,promoRate,ambianceRate,wifiRate,serviceRate,chooseRate,loyal
0,10,8,10,10,6,10,10,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,productRate,priceRate,promoRate,ambianceRate,wifiRate,serviceRate,chooseRate,loyal
0,2,3,-,-,-,-,-,1
1,-,4,-,5,-,-,-,1
2,4,2,-,-,-,-,-,1


After conducting a thorough analysis, we reached the final conclusion that decreasing various features in those scenarios does not seem logical. Therefore, it is essential to consider additional features to effectively enhance loyalty. For instance, factors such as proximity of the location to the clients, the time spent in line to make an order, and revisiting the baseline and reevaluating the loyalty definition method could significantly impact loyalty.

By taking these additional aspects into account, we can gain a more comprehensive understanding of the factors influencing loyalty and devise a well-rounded strategy to foster stronger customer loyalty and satisfaction. This will enable us to make informed decisions that align with our objectives and enhance overall customer experience.