# **Customer Lifetime Value Prediction**

<p> Customer lifetime value is defined as the present value of a customer for the company based on projected future cash flows from the customer relationship. CLTV represents the total amount of money spent on the business or products over lifetime of a customer.</p>


## 1. Loading the Dataset & Checking Variables

In [None]:
pip install openpyxl

In [None]:
pip install xlrd

In [None]:
# Loading the required libraries
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# Reading the online retail dataset
df_ = pd.read_excel("../input/online-retail-ii-data-set-from-ml-repository/online_retail_II.xlsx",sheet_name="Year 2010-2011")

# Copying the online retail dataset 
df = df_.copy()
df.head()

In [None]:
# Checking numerical variables
df.describe().T

In [None]:
# Checking null variables 
df.isna().sum()

## 2. Data Preprocessing

In [None]:
# Data preparation step 1: Removing null oberservations
df.dropna(inplace=True)

# Data preparation step 1: Removing canceled orders 
df = df[~df["Invoice"].str.contains("C", na=False)]
df = df[df["Quantity"] > 0]

df.describe([0.01,0.25,0.50,0.75,0.99]).T

In [None]:
# Defining functions for outliers
def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.01)
    quartile3 = dataframe[variable].quantile(0.99)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit

# Defining functions to replace outliers
def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    # dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

In [None]:
# Data preparation step 2: Replacing outliers in the Quantity and Price columns with the upper limit
replace_with_thresholds(df, "Quantity")
replace_with_thresholds(df, "Price")

df.describe([0.01,0.25,0.50,0.75,0.99]).T

In [None]:
# Data preparation step 3: Calculating total price per transaction 
df["TotalPrice"] = df["Quantity"] * df["Price"]
today_date = dt.datetime(2011, 12, 11)

In [None]:
# Defining today date as max(InvoiceDate) + 2 days
today_date = dt.datetime(2011, 12, 11)
print(f" Maximum invoice date: {df.InvoiceDate.max()} \n Today date: {today_date}")

## 3. Deriving the RFM Metrics

Important metrics:

**<p>Recency:**  The age of the customer at the time of their last purchase.
**<p>Monetary:** The average total sales of the customer.
**<p>Frequency:** Number of purchases/transactions.
**<p>Age (T):** The age of the customer since the date of a customer's first purchase to the current date.

In [None]:
# Calculating recency, monetary, frequency and tenure metrics
rfm = df.groupby("Customer ID").agg({"InvoiceDate": [lambda date: (date.max() - date.min()).days,
                                                     lambda date: (today_date - date.min()).days],
                                     "Invoice": lambda num: num.nunique(),
                                      "TotalPrice": lambda price: price.sum()}) #total price per customer

rfm.columns = rfm.columns.droplevel(0)
rfm.columns = ['Recency', "T", 'Frequency', "Monetary"]

# Calculating average monetary values per order:
rfm["Monetary"] = rfm["Monetary"] / rfm["Frequency"]

rfm.head()

In [None]:
# Removing one-time purchases from dataset
rfm = rfm[(rfm['Frequency'] > 1)]

# Copying dataset
cltv = rfm.copy()
rfm.head()

# 4. Train BG-NBD Model

In [None]:
pip install lifetimes

In [None]:
# Loading the required libraries
from lifetimes import BetaGeoFitter
from lifetimes import GammaGammaFitter
from lifetimes.plotting import plot_probability_alive_matrix

In [None]:
# Checking BG/NBD model assumption and requirements
print(cltv[['Monetary', 'Recency']].corr())  # Correlation between monetary ve recency variables
cltv["Frequency"] = cltv["Frequency"].astype(int) # Type of frequency variable should be integer for BG-NBD model

In [None]:
# Creating BG-NBD Model
bgf = BetaGeoFitter(penalizer_coef=0.001) # model object
bgf.fit(cltv['Frequency'], cltv['Recency'], cltv['T']) # model fitting

# Prediction of expected number of transaction for each customer for one year (365 days)
cltv['expctd_num_of_purch'] = bgf.predict(365, cltv['Frequency'], cltv['Recency'], cltv['T']) 
cltv.sort_values("expctd_num_of_purch",ascending=False).head()

In [None]:
%matplotlib inline
# set figure size
plt.subplots(figsize=(10, 5))
plot_probability_alive_matrix(bgf)
plt.show()

# 5. Train Gamma Gamma Model 
 

In [None]:
# Creating Gamma-Gamma Model
ggf = GammaGammaFitter(penalizer_coef=0.01) # model object
ggf.fit(cltv['Frequency'], cltv['Monetary']) # model fitting

# Prediction of expected amount of average profit
cltv["expct_avg_spend"] = ggf.conditional_expected_average_profit(cltv['Frequency'], cltv['Monetary'])

cltv.head()

## 6. Final: Calculate CLTV

In [None]:
# Calculating customer lifetime value by using BG-NBD and GammaGamma models: 

cltv["cltv_one_year"] = ggf.customer_lifetime_value(bgf,
                                   cltv['Frequency'],
                                   cltv['Recency'],
                                   cltv['T'],
                                   cltv['Monetary'],
                                   time=12,  # 12 month
                                   freq="D",  # frequency of T
                                   discount_rate=0.01)

cltv.sort_values("cltv_one_year",ascending=False).head()