# RFM builder by data driven paradigm
__Author:__ Moyocoyani Molina-Espíritu

__Date:__ 27/12/2018

Customer segments obtained by an RFM analysis are well defined. There are some specifications for managing 11 segments by [ Anish Nair](https://www.putler.com/rfm-analysis/), or if you prefer fewer segments, you can follow this article written by [Pushpa Makhija](https://clevertap.com/blog/rfm-analysis/).

Showing up next we write a code that obtain the 11 segments listed by  [ Anish Nair](https://www.putler.com/rfm-analysis/). Is worthy to mention that we proceed by generate an RFM partition based on quintiles.

The main idea is that you use this code with a dataset wherein the _recency_, _frequency_ and _monetary_ (or engagement) variable are well defined. Hence, the code generates the corresponding label for each customer in your dataset.

I named this version as data driven paradigm because I work with this approach. Basically I build a cubic space where the different labels are delimited by pre-defined boundaries. Next, a few classificators are trainned, and the best of them is selected. This classificator is in charge for assigning the corresponding labels to each customer on the dataset.

We require numpy and pandas libraries.

In [0]:
import numpy as np
import pandas as pd

## Creation of the theoretical cubic space (training set)
First we need to create a theoretical space with well defined boundaries. We call this space as Local Space, and this is going to be our training dataset.

LocalSpace() will generate a cubic space of $NxNxN$ dimensions, wehere $N$ is named as totalCostumers. This is produced by $self.shapeReFactor$ variable.

In [0]:
class LocalSpace():
  
    def __init__(self,label,totalCostumers,x,y,z):
        self._label=label
        self._noCostumers = totalCostumers
        self._x= x
        self._y= y
        self._z= z
        self.shapeReFactor=totalCostumers*totalCostumers*totalCostumers

    def CreateCoordinates(self): #Coordinates correspond to recency, frequency and monetary variables
        x_ = np.linspace(self._x[0],self._x[1],self._noCostumers)
        y_ = np.linspace(self._y[0],self._y[1],self._noCostumers)
        z_ = np.linspace(self._z[0],self._z[1],self._noCostumers)
        return(x_,y_,z_)

    def CreateGrids4Segment(self):
        return np.meshgrid(self.CreateCoordinates()[0],self.CreateCoordinates()[1],self.CreateCoordinates()[2])

    def Reshape_(self,matrix_):
        return matrix_.reshape(self.shapeReFactor)

    def CreateLocalSpace(self):
        return pd.DataFrame({'Recency': self.Reshape_(self.CreateGrids4Segment()[0]),
                             'Frequency': self.Reshape_(self.CreateGrids4Segment()[1]),
                             'Monetary': self.Reshape_(self.CreateGrids4Segment()[2]),
                             'Label': [self._label]*self.shapeReFactor})

The following class has the required values to create the theoretical cubic space with arbitrary boundaries.

builderCubicSpace calls LocalSpace, and it creates the cubic space with these attributes. This is the part of the code that you can modify if you want to add/remove labels, or change the boundaries either by other percentiles, or by customized limits.

In [0]:
class RFMlabels():
  
# we require the dataset, and pre-defined labels for the automatic generation.
# since we decide to use quintiles, we can predefine the limits in recency,
# frequency and monetary for every label
# from this limits we are able to build the customer abstract space
  def __init__(self):
    
    self.labels= ['champions','loyal customers','potential loyalist','new customers',
           'promising','need attention','about to sleep','at risk',
           'cant lose them','hibernating','lost']
    self.x = np.array([[0.0,0.2],
                         [0.0,0.6],
                         [0.0,0.4],
                         [0.0,0.2],
                         [0.2,0.4],
                         [0.4,0.6],
                         [0.4,0.6],
                         [0.6,1.0],
                         [0.8,1.0],
                         [0.6,0.8],
                         [0.6,1.0]])
    self.y = np.array([[0.8,1.0],
                           [0.6,1.0],
                           [0.2,0.6],
                           [0.0,0.2],
                           [0.0,0.2],
                           [0.4,0.6],
                           [0.0,0.4],
                           [0.4,1.0],
                           [0.8,1.0],
                           [0.2,0.4],
                           [0.0,0.4]])
    self.z = np.array([[0.8,1.0],
                           [0.6,1.0],
                           [0.2,0.6],
                           [0.0,0.2],
                           [0.0,0.2],
                           [0.4,0.6],
                           [0.0,0.4],
                           [0.4,1.0],
                           [0.8,1.0],
                           [0.2,0.4],
                           [0.0,0.4]])
    
    self.CubicSpace = pd.DataFrame({'Recency': [],
                                'Frequency': [],
                                'Monetary': [],
                                'Label': []})

  
  def builderCubicSpace(self):
    for i in range(len(self.labels)):
      self.CubicSpace= self.CubicSpace.append(LocalSpace(self.labels[i],20,self.x[i],self.y[i],self.z[i]).CreateLocalSpace())
    return self.CubicSpace
  

## Training the classificator
Once you have created the training set, with the well delimited boundaries, the only thing you need to do is to train a model.

Practically, the automata is going to learn from the boundaries and assign the corresponding label. For this case, and for simplicity, we are going to train a decission tree with some arbitrary hyperparameters. You are free to train the model that you want, even to run a grid search algorithm.

In [0]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
import time

df = RFMlabels().builderCubicSpace()
X_train, X_validation, y_train, y_validation = train_test_split(df[['Recency','Frequency','Monetary']], 
                                                                df[['Label']], test_size=0.25, random_state=42)
X_train,X_test,y_train,y_test = train_test_split(X_train,y_train,test_size=0.25,random_state=68)

classifier_ = DecisionTreeClassifier(criterion='entropy', max_depth= 10, min_samples_split= 3)

classifier_.fit(X_train,y_train)

y1_true, y1_pred = y_test, classifier_1.predict(X_test)
print(classification_report(y1_true, y1_pred))
print('**********************************************************')
print(classification_report(y2_true, y2_pred))
print('**********************************************************')

#Validation
y1_true, y1_pred = y_validation, classifier_1.predict(X_validation)
print(classification_report(y1_true, y1_pred))
print('**********************************************************')
print(classification_report(y2_true, y2_pred))
print('**********************************************************')

                    precision    recall  f1-score   support

    about to sleep       0.95      0.96      0.95      1492
           at risk       1.00      0.91      0.95      1623
    cant lose them       0.93      1.00      0.97      1451
         champions       0.91      1.00      0.95      1541
       hibernating       0.88      1.00      0.93      1439
              lost       0.97      0.85      0.91      1521
   loyal customers       0.98      0.90      0.94      1492
    need attention       0.98      1.00      0.99      1494
     new customers       0.98      0.96      0.97      1520
potential loyalist       1.00      0.97      0.98      1471
         promising       0.95      0.98      0.96      1456

         micro avg       0.96      0.96      0.96     16500
         macro avg       0.96      0.96      0.96     16500
      weighted avg       0.96      0.96      0.96     16500

**********************************************************
                    precision    recal

## Saving your local classificator.
If you are on a local machine, you can save your classificator with the pickle library and run that model anytime you want. If you do this, you don't have to train the classificator for every run, and for every dataset, just load the saved model and that's it.

In [0]:
import pickle
filename = 'RFPredictor.sav'
pickle.dump(classifier_1, open(filename, 'wb'))
 
# some time later...
 
# load the model from disk
#loaded_model = pickle.load(open(filename, 'rb'))

This how the entire code looks like:

In [0]:
import numpy as np
import pandas as pd

class LocalSpace():
  
    def __init__(self,label,totalCostumers,x,y,z):
        self._label=label
        self._noCostumers = totalCostumers
        self._x= x
        self._y= y
        self._z= z
        self.shapeReFactor=totalCostumers*totalCostumers*totalCostumers

    def CreateCoordinates(self):
        x_ = np.linspace(self._x[0],self._x[1],self._noCostumers)
        y_ = np.linspace(self._y[0],self._y[1],self._noCostumers)
        z_ = np.linspace(self._z[0],self._z[1],self._noCostumers)
        return(x_,y_,z_)

    def CreateGrids4Segment(self):
        return np.meshgrid(self.CreateCoordinates()[0],self.CreateCoordinates()[1],self.CreateCoordinates()[2])

    def Reshape_(self,matrix_):
        return matrix_.reshape(self.shapeReFactor)

    def CreateLocalSpace(self):
        return pd.DataFrame({'Recency': self.Reshape_(self.CreateGrids4Segment()[0]),
                             'Frequency': self.Reshape_(self.CreateGrids4Segment()[1]),
                             'Monetary': self.Reshape_(self.CreateGrids4Segment()[2]),
                             'Label': [self._label]*self.shapeReFactor})
      
class RFMlabels():
  
# we require the dataset, and pre-defined labels for the automatic generation.
# since we decide to use quintiles, we can predefine the limits in recency,
# frequency and monetary for every label
# from this limits we are able to build the customer abstract space
  def __init__(self):
    
    self.labels= ['champions','loyal customers','potential loyalist','new customers',
           'promising','need attention','about to sleep','at risk',
           'cant lose them','hibernating','lost']
    self.x = np.array([[0.0,0.2],
                         [0.0,0.6],
                         [0.0,0.4],
                         [0.0,0.2],
                         [0.2,0.4],
                         [0.4,0.6],
                         [0.4,0.6],
                         [0.6,1.0],
                         [0.8,1.0],
                         [0.6,0.8],
                         [0.6,1.0]])
    self.y = np.array([[0.8,1.0],
                           [0.6,1.0],
                           [0.2,0.6],
                           [0.0,0.2],
                           [0.0,0.2],
                           [0.4,0.6],
                           [0.0,0.4],
                           [0.4,1.0],
                           [0.8,1.0],
                           [0.2,0.4],
                           [0.0,0.4]])
    self.z = np.array([[0.8,1.0],
                           [0.6,1.0],
                           [0.2,0.6],
                           [0.0,0.2],
                           [0.0,0.2],
                           [0.4,0.6],
                           [0.0,0.4],
                           [0.4,1.0],
                           [0.8,1.0],
                           [0.2,0.4],
                           [0.0,0.4]])
    
    self.CubicSpace = pd.DataFrame({'Recency': [],
                                'Frequency': [],
                                'Monetary': [],
                                'Label': []})

  
  def builderCubicSpace(self):
    for i in range(len(self.labels)):
      self.CubicSpace= self.CubicSpace.append(LocalSpace(self.labels[i],20,self.x[i],self.y[i],self.z[i]).CreateLocalSpace())
    return self.CubicSpace

#*************************************************************************************#  
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
import time

df = RFMlabels().builderCubicSpace()
X_train, X_validation, y_train, y_validation = train_test_split(df[['Recency','Frequency','Monetary']], 
                                                                df[['Label']], test_size=0.25, random_state=42)
X_train,X_test,y_train,y_test = train_test_split(X_train,y_train,test_size=0.25,random_state=68)

classifier_ = DecisionTreeClassifier(criterion='entropy', max_depth= 10, min_samples_split= 3)

classifier_.fit(X_train,y_train)

y1_true, y1_pred = y_test, classifier_1.predict(X_test)
print(classification_report(y1_true, y1_pred))
print('**********************************************************')
print(classification_report(y2_true, y2_pred))
print('**********************************************************')

#Validation
y1_true, y1_pred = y_validation, classifier_1.predict(X_validation)
print(classification_report(y1_true, y1_pred))
print('**********************************************************')
print(classification_report(y2_true, y2_pred))
print('**********************************************************')

If you want to see this code in action, and a further explanation, don't forget to click [on this link]() to see it on kaggle. 

Or, if you want some explanation about the simplicity and foundations of an RFM analysis, you can read [this post on medium (spanish version
)]().

If you are an analyst, and want to use the results from an RFM analysis, and put together into a dashboard, you can visit [this link where the results from the kaggle implementation are depicted into a tableau dashboard]()