# **Introduction**

CRM (customer relationship management) analytics comprises all of the programming that analyzes data about customers and presents it to an organization to help facilitate and streamline better business decisions.

CRMs were originally designed to target large corporations, but the internet has allowed small business owners to take advantage of these tools as well. Customer data is collected in a CRM database, which allows for advanced analysis such as customer segmentation and contact history.

Applications of CRM Analytics 
* Customer Segmentation Groups
* Profitability Analysis and Customer Value
* Personalization
* Measuring and Tracking Escalation
* Predictive Modeling

In this notebook, we will be explaining how you can apply your customer relationship management system to analyze your customer base in order to increase revenue through better marketing campaigns!

Content:

1. CRM - Customer Relationship Management
2. Key Performance Indicators
3. Cohort Analysis
4. Business Problem
* Dataset Story
* Variables
5. Libraries
6. Load and Check Data
* Data Preprocessing
* Outlier Observations
7. Exploratory Data Analysis
* Categorical Variables
* Numerical Variables
8. Customer Segmentation With RFM
* Preparation of RFM Metrics
* Generating RFM Scores
* Segmenting Customers Based on RFM Scores
* Visualization of RFM Segments
9. CLTV - Customer Lifetime Value
* Preparation Data Structure of CLTV
* BG - NBD Model
* Gamma Gamma Model
* BG - NBD and GG Model For Prediction
* Segmentation on CLTV Forecasts
10. References

# **Customer Relationship Management**

**Definition of CRM**

Customer relationship management (CRM) is a process in which a business or other organization administers its interactions with customers, typically using data analysis to study large amounts of information.

**CRM Analytic**

* Customer lifecycle optimizations: Refers to the customer's journey. It starts from the time of contacting the company. Initial points of contact: registration process, social media post etc.
* Internal and External Communications : Language, colour, images,campaigns
* Customer Acquisition
* Customer Retention (abandonment/churn)
* Cross-sell, up-sell
* Customer Segmentation Studies : These are strategy development studies by dividing customers into groups. 

Purpose : It is an effort to make the entire customer relations process more efficient based on data. Sometimes it means effort, time, strategy development, more work with less effort, process optimization. 



**KPIs - KEY PERFORMANCE INDICATORS**

A performance indicator or key performance indicator(KPI) is a sort of performance measurement that assesses the success of an organization or a specific activity (such as projects,programs,products, and other initiatives). KPIs evaluate the efficiency of an organization or any other activity in which it participates.
* Customer Acquisition Rate
* Customer Retention Rate
* Customer Churn Rate
* Conversion Rate


**Cohort Analysis**
Cohort analysis is a type of behavioral analytics that separates the data in a data set into comparable groups before analysis. These units, or cohorts, are usually characterized by similar qualities or events over a specific timespan.

Analysis of a customer's (or user's) behavior across the lifecycle might reveal important trends. By breaking down customers into smaller groups, you can better see patterns throughout each customer's life cycle rather than just looking at all clients uniformly without regard for the natural cycle that a client goes through.

Cohort: A group of people with common characteristics.

Cohort Analysis: It is the analysis of the behavior of a group of people with common characteristics.

**Business Problem**

An e-commerce company wants to segment its customers and determine marketing strategies according to these segments. For example, it is desired to organize different campaigns for new customers and different campaigns in order to retain customers that are very profitable for the company.


**Dataset Story**

* The dataset includes sales between 01/12/2009 - 09/12/2011.
* In this project, the years 2010-2011 will be examined.
* The product catalog of this company includes souvenirs.
* The vast majority of the company's customers are corporate customers.

**Variables**

* **InvoiceNo:** Invoice number. The unique number of each transaction, namely the invoice. Aborted operation if it starts with C.
* **StockCode:** Product code. Unique number for each product.
* **Description:** Product name
* **Quantity:** Number of products. It expresses how many of the products on the invoices have been sold.
* **InvoiceDate:** Invoice date and time.
* **UnitPrice:** Product price (in GBP)
* **CustomerID:** Unique customer number
* **Country:** The country where the customer lives.

# LIBRARIES

In [None]:
#installation required

!pip install lifetimes
!pip install openpyxl

#libraries
from sqlalchemy import create_engine
import datetime as dt
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from lifetimes import BetaGeoFitter
from lifetimes import GammaGammaFitter
from lifetimes.plotting import plot_period_transactions
from sklearn.preprocessing import MinMaxScaler
import squarify  #treemap
import warnings
warnings.filterwarnings("ignore")

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname,filename))

**Load and Check Data**

In [None]:
df_2010_2011 = pd.read_excel("/kaggle/input/online-retail-ii-data-set-from-ml-repository/online_retail_II.xlsx",sheet_name="Year 2010-2011")

df = df_2010_2011.copy()
df.head()

In [None]:
# We deal with purchases in our analysis. Therefore, we have excluded returns from the data.

df = df[~df["Invoice"].str.contains("C",na=False)]
df.shape

**Data Preprocessing**

In [None]:
def check_df(dataframe):
    print("################ Shape ####################")
    print(dataframe.shape)
    print("############### Columns ###################")
    print(dataframe.columns)
    print("############### Types #####################")
    print(dataframe.dtypes)
    print("############### Head ######################")
    print(dataframe.head())
    print("############### Tail ######################")
    print(dataframe.tail())
    print("############### Describe ###################")
    print(dataframe.describe().T)

check_df(df)

In [None]:
df.isnull().sum()

In [None]:
df.dropna(inplace=True)
df.isnull().sum()

In [None]:
def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.01)
    quartile3 = dataframe[variable].quantile(0.99)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit

def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

replace_with_thresholds(df, "Quantity")
replace_with_thresholds(df, "Price")

# Exploratory Data Analysis

**Categorical Variables**

In [None]:
cat_cols = [col for col in df.columns if df[col].dtypes =="O"]
cat_but_car = [col for col in df.columns if df[col].nunique() > 100 and df[col].dtypes == "O"]
cat_cols = [col for col in cat_cols if col not in cat_but_car]
cat_cols

In [None]:
def cat_summary(dataframe, col_name, plot=False):
    print(pd.DataFrame({col_name:dataframe[col_name].value_counts(),
                       "Ratio": 100 * dataframe[col_name].value_counts() / len(dataframe)}))
    print('#########################################')
    if plot:
        fig_dims = (15,5)
        fig,ax = plt.subplots(figsize=fig_dims)
        sns.countplot(x=dataframe[col_name],data=dataframe)
        plt.xticks(rotation = 45, ha='right')
        plt.show()
        
cat_summary(df,"Country",plot=True)

**Numerical Variables**

In [None]:
num_cols = [col for col in df.columns if df[col].dtypes != 'O' and col not in "Customer ID"]
num_cols

In [None]:
def num_summary(dataframe,numerical_col, plot=False):
    quantiles = [0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 0.95, 0.99]
    print(dataframe[numerical_col].describe(quantiles).T)
    
    if plot:
        dataframe[numerical_col].hist(bins=20)
        plt.xlabel(numerical_col)
        plt.title(numerical_col)
        plt.show()
        
for col in num_cols:
    num_summary(df, col ,plot=True)

In [None]:
df["StockCode"].nunique()

In [None]:
#How many sales for each product ?
df_product = df.groupby("Description").agg({"Quantity":"count"})
df_product.reset_index(inplace=True)
df_product

In [None]:
top_pr = df_product.sort_values(by="Quantity",ascending=False).head(10)

sns.barplot(x="Description",y="Quantity",data=top_pr)
plt.xticks(rotation=90)
plt.show()

In [None]:
#total price per invoice
df["TotalPrice"] = df["Price"]* df["Quantity"]

**Customer Segmentation With RFM**

WHAT IS RFM?
The RFM method is a tool for assessing consumer value. It's frequently utilized in database marketing and direct marketing, as well as retail and professional services.

RFM stands for the three dimensions:

* **Recency:** How recently did the customer purchase?
* **Frequency:** How often do they purchase?
* **Monetary Value:** How much do they spend?


**Preparation of RFM Metrics**

* **recency:**the difference between today and the customer's last purchase date, in days
* **frequency:** customer's shopping frequency
* **monetary:** total money paid by the customer

In [None]:
# Determining the analysis date for the recency
df["InvoiceDate"] = pd.to_datetime(df["InvoiceDate"])
df["InvoiceDate"].max()
today_date = dt.datetime(2011, 12, 11)

In [None]:
# Generating RFM metrics
rfm = df.groupby("Customer ID").agg({"InvoiceDate": lambda InvıiceDate: (today_date- InvıiceDate.max()).days,
                                    "Invoice": lambda Invoice: Invoice.nunique(),
                                    "TotalPrice": lambda TotalPrice: TotalPrice.sum()})

rfm.columns = ["recency","frequency","monetary"]
rfm.describe().T

In [None]:
# monetary, the min value of the total money paid can't be 0
# let's remove them from the data

rfm = rfm[rfm["monetary"] > 0]
rfm.describe().T

**Generating RFM Scores**

In [None]:
# recency_score
rfm["recency_score"] = pd.qcut(rfm['recency'], 5, labels=[5, 4, 3, 2, 1])
# frequency_score
rfm["frequency_score"] = pd.qcut(rfm["frequency"].rank(method="first"), 5, labels=[1, 2, 3, 4, 5])
# monetary_score
rfm["monetary_score"] = pd.qcut(rfm["monetary"], 5, labels=[1, 2, 3, 4, 5])

#  RFM Score
rfm["RFM_SCORE"] = (rfm["recency_score"].astype(str) + rfm["frequency_score"].astype(str))
rfm.head(10)

**Segmenting Customers Based on RFM Scores**

In [None]:
seg_map = {
    r'[1-2][1-2]': 'hibernating',
    r'[1-2][3-4]': 'at_Risk',
    r'[1-2]5': 'cant_loose',
    r'3[1-2]': 'about_to_sleep',
    r'33': 'need_attention',
    r'[3-4][4-5]': 'loyal_customers',
    r'41': 'promising',
    r'51': 'new_customers',
    r'[4-5][2-3]': 'potential_loyalists',
    r'5[4-5]': 'champions'
}
rfm['segment'] = rfm['RFM_SCORE'].replace(seg_map, regex=True)
rfm.head(10)

In [None]:
# Let's group RFM mean and frequency values according to segments
rfm[["segment", "recency", "frequency", "monetary"]].groupby("segment").agg(["mean", "count"])

# Visualization of RFM Segments

In [None]:
sgm= rfm["segment"].value_counts()
plt.figure(figsize=(10,7))
sns.barplot(x=sgm.index,y=sgm.values)
plt.xticks(rotation=45)
plt.title('Customer Segments',color = 'blue',fontsize=15)
plt.show()

In [None]:
# Treemap Visualization
df_treemap = rfm.groupby('segment').agg('count').reset_index()
df_treemap.head()

In [None]:
fig, ax = plt.subplots(1, figsize = (10,10))

squarify.plot(sizes=df_treemap['RFM_SCORE'], 
              label=df_treemap['segment'], 
              alpha=.8,
              color=['tab:red', 'tab:purple', 'tab:blue', 'tab:pink', 'tab:gray']
             )
plt.axis('off')
plt.show()
#plt.savefig('treemap.png')

**Customer Lifetime Value**

Customer lifetime value (CLV), a term sometimes used interchangeably with customer lifetime value, is the prediction of a company's net profit contributed to its overall future relationship with a customer. The model can be simple or sophisticated, depending on how complex the predictive analytics techniques are.

Lifetime value is a critical metric because it represents the maximum amount that customers may be expected to spend in order to acquire new ones. As a result, it's crucial in determining the payback of marketing expenses used in marketing mix modeling.

**Definition of CLTV**
The present value of the future cash flows attributed to the customer during his/her entire relationship with the company.!

This account represents a single time period. It represents the time when the analysis was done. Let me give such a projection that we can evaluate the issue with 3-month and 6-month projections.

How can I make my inference? We will realize lifetime value with medium and long-term projections for individuals by including the specific pattern of the whole population, by extracting the conditional probability distribution, and generalizing them in terms of the characteristics of a particular individual.

**Formula**

Probabilistic lifetime value estimation with time projection

CLTV =( Customer Value / Churn Rate) * Profit Margin

Customer Value = Purchase Frequency * Average Order Value

CLTV = Expected Number of Transaction * Expected Average Profit

Above, purchase frequency and number of transaction mean the same thing. Likewise, Average Order Value and Average Profit mean the same thing. It differs with the Expected part that happens to them.

**CAUTION:** It will add a probabilistic distribution. Expected statement refers to this part. Expected number of purchases, expected profitability.

BG / NBD = Expected Transaction
Gamma Gamma = Expected Profit

**So How Do We Do That?**

We will add statistics and probability pattern to the above formula. There will be BG / NBD and Gamma Gamma models that will make this happen to us. These models will do such a thing that they will model the purchasing behavior of all customers of this company, after modeling the purchasing behavior of all customers, they will replace the individual's personal characteristics in this model and reduce the expected number of sales to the person from the general audience pattern.

BG NBD and Gamma Gamma models are statistical models, not machine learning models. In fact, these models have the expression "Conditional" at the beginning.



**Preparation-Data Structure of CLTV**

*** recency:** the difference between the customer's last purchase and his first purchase

*** T:** the age of the client in the company

*** frequency:** total number of repeat purchases

*** monetary_value:** average earnings per purchase

In [None]:
#Determining the analysis date for the recency

df["InvoiceDate"] = pd.to_datetime(df["InvoiceDate"])
df["InvoiceDate"].max()
today_date = dt.datetime(2023, 7,20)

In [None]:
cltv_df = df.groupby('Customer ID').agg({'InvoiceDate': [lambda date: (date.max() - date.min()).days,
                                                         lambda date: (today_date - date.min()).days],
                                         'Invoice': lambda num: num.nunique(),
                                         'TotalPrice': lambda TotalPrice: TotalPrice.sum()})


cltv_df.columns = cltv_df.columns.droplevel(0)
cltv_df.columns = ['recency', 'T', 'frequency', 'monetary']
cltv_df.head()

In [None]:
# monetary değerini toplam totalPrice olarak hesaplamıştık.
# bu aşamada moneary değerini satın alma başına ortalama kazanç olarak ifade edeceğiz
cltv_df["monetary"] = cltv_df["monetary"] / cltv_df["frequency"]

# monetary nin sıfırdan büyük olanlarının seçelimesi
cltv_df = cltv_df[cltv_df["monetary"] > 0]

# BGNBD için recency ve T'nin haftalık cinsten ifade edilmesi
cltv_df["recency"] = cltv_df["recency"] / 7
cltv_df["T"] = cltv_df["T"] / 7

# frequency nin 1 den büyük olanlarının seçilmesi
cltv_df = cltv_df[(cltv_df['frequency'] > 1)]
cltv_df.head()

# BG - NBD Model

BG / NBD (Beta Geometric / Negative Binominal Distribution) = Expected Number of Transaction 

**Buy Till You Die**

The BG/NBD Model probabilistically models two processes for the Expected Number of Transaction.

Transaction Process (Buy) + Dropout Process (Till You Die)

**Transaction Process(Buy)**

* As long as it is alive, the number of transactions to be performed by a client in a given time period is poisson distributed with the transaction rate parameter.
* As long as customer is alive, they will continue to make random purchases around their transaction rate.
* Transaction rates vary according to each customer. They are gamma dispersed for the entire audience(r,a).

So they buy process of the BG/NBD model indiccates; I am modelling the purchasing activity of the whole audience with the gamma distribution. 

**Dropout Process (Till You Die)**

* Each customer has a dropout rate (dropout probability) with probability p.
* A customer drops with a certain probability after making a purchase. This is not a full churn, it may return after a certain time.
* Dropout rates vary for each client and beta is distributed for the entire audience(a,b). 

In [None]:
bgf = BetaGeoFitter(penalizer_coef = 0.001)
bgf.fit(cltv_df['frequency'],
       cltv_df['recency'],
       cltv_df['T'])


In [None]:
# 1 week expected purchase (transaction)

cltv_df["expected_purc_1_week"] = bgf.predict(1,
                                             cltv_df['frequency'],
                                             cltv_df['recency'],
                                             cltv_df['T'])

cltv_df.sort_values("expected_purc_1_week", ascending=False).head(15)

In [None]:
# 1 month expected purchase

cltv_df["expected_purc_1_month"] = bgf.predict(4,
                                              cltv_df['frequency'],
                                              cltv_df['recency'],
                                              cltv_df['T'])

cltv_df.sort_values("expected_purc_1_month", ascending=False).head(15)

# Gamma Gamma Model

It is used to estimate how much profit a customer can generate on average per transaction.

**What will the gamma gamma model do ?** 

It will output the Expected Average Profit. This means, The Expected Average Profit distribution will be modeled over the whole audience, and the Gamma Gamma Submodel will be conditionally giving us the Expected Average Profit for a person, taking into account the distribution of the whole audience, according to the characteristics of the person himself. 

In [None]:
ggf = GammaGammaFitter(penalizer_coef=0.01)
ggf.fit(cltv_df['frequency'],cltv_df['monetary'])

In [None]:
cltv_df["expected_average_profit"] = ggf.conditional_expected_average_profit(cltv_df['frequency'],
                                                                            cltv_df['monetary'])

cltv_df.sort_values("expected_average_profit",ascending=False).head(20)

**BG-NBD and GG Model For Prediction**

In [None]:
cltv = ggf.customer_lifetime_value(bgf,
                                  cltv_df['frequency'],
                                  cltv_df['recency'],
                                  cltv_df['T'],
                                  cltv_df['monetary'],
                                  time=6, 
                                  freq="W",
                                  discount_rate = 0.01)

In [None]:
# Reset Index
cltv = cltv.reset_index()
# Merging the main table and forecast values table 
cltv_final = cltv_df.merge(cltv,on="Customer ID", how="left")
#sorting
cltv_final.sort_values(by="clv",ascending=False).head(10)


In [None]:
# 1 Month CLTV: 

cltv_1 = ggf.customer_lifetime_value(bgf,
                                    cltv_df['frequency'],
                                    cltv_df['recency'],
                                    cltv_df['T'],
                                    cltv_df['monetary'],
                                    time = 1,
                                    freq = "W",
                                    discount_rate = 0.01)
cltv_1.head()
cltv_1 = cltv_1.reset_index()
cltv_1 = cltv_df.merge(cltv_1, on="Customer ID", how="left")
cltv_1.sort_values(by="clv",ascending=False).head(10)

In [None]:
# 12 Month CLTV Forecast:

cltv_12 = ggf.customer_lifetime_value(bgf,
                                     cltv_df['frequency'],
                                     cltv_df['recency'],
                                     cltv_df['T'],
                                     cltv_df['monetary'],
                                     time=12,
                                     freq="W",
                                     discount_rate = 0.01)

cltv_12.head()
cltv_12 = cltv_12.reset_index()
cltv_12 = cltv_df.merge(cltv_12, on="Customer ID", how="left")
cltv_12.sort_values(by="clv",ascending=False).head(10)

**Segmentation on CLTV Forecasts**

In [None]:
# Normalization 0-1 Range for CLV Values

scaler = MinMaxScaler(feature_range=(0,1))
scaler.fit(cltv_final[["clv"]])
cltv_final["scaled_clv"] = scaler.transform(cltv_final[["clv"]])

cltv_final.sort_values(by="scaled_clv", ascending=False).head()

In [None]:
# Segmentation of Customers 

cltv_final["segment"] = pd.qcut(cltv_final["scaled_clv"], 4, labels = ["D","C","B","A"])

cltv_final.head()


In [None]:
# Examination of Segments

cltv_final.groupby("segment").agg({"count","mean","sum"})

**References**

* https://www.veribilimiokulu.com/
* https://www.kaggle.com/haticeebraralc/crm-analytics
* https://en.wikipedia.org/wiki/Customer_lifetime_value
* https://en.wikipedia.org/wiki/RFM_(market_research)
* https://mebaysan.medium.com/customer-life-time-value-prediction-by-using-bg-nbd-gamma-gamma-models-and-applied-example-in-997a5ee481ad