## **Multi Mart data for Frequency and Revenue Analysis**

Frequency and Revenue Analysis is an estimate of all the future profits to be accumulated from a relationship with a given customer. It is used in the business to measure the performance of retention strategies and to provide insights into how much should be spent in customer acquisition.

### **Data Dictionary**

**Objective**: To understand and gain insights from an E-Commerce dataset by performing various exploratory data analyses, data visualization, and data modelling.<br>
<br>
**Dataset Columns:**

**CustomerID** : Unique customer ID<br>
**first purchase date** : It refers to the date when a customer or user made their initial purchase or transaction with the organization.<br>
**last purchase date** : It refers to the date when a customer or user made their most recent purchase or transaction with the organization.<br>
**total purchases** : It is the count or sum of all purchases made by a customer or user with the organization.<br>
**total revenue** : It is the sum of all revenue generated from customer or user transactions with the organization.<br>
**referral source** : It provides information about how individuals found out about the products or services.<br>
**churn indicator** : This is a binary flag that indicates whether a customer or user has churned (i.e., stopped using the products or services) or is still an active customer. Typically, a value of 1 or "Yes" is used to indicate churn, while 0 or "No" is used to indicate an active customer.<br>
**discount used** : It provides information about whether a discount was utilized for a specific purchase or order.<br>
**product category** : It classifies products into specific categories or groups based on their characteristics or purpose.<br> **responsetolastcampaign** : This indicates whether and how a customer or user responded to the most recent marketing campaign.<br>
**feedbackscore** : It represents a numeric score or rating provided by customers or users as feedback for a product, service, or experience.<br>
**preferredpaymentmethod** : It provides information about the customer's preferred way to make payments.<br>
**supportticketsraised** : It represents the number of customer or user support tickets that have been opened or raised by individuals seeking assistance, reporting issues, or making inquiries.<br>
**hasloyaltycard** : This is a binary indicator that shows whether a customer or user possesses a loyalty card with the organization.<br>
**frequency** : The frequency column represents how often a customer or user interacts with the organization, such as making purchases, engaging with the services, or participating in activities. The frequency column is based on the first purchase date and the last purchase date period.


In [1]:
# importing the necessary  libraries
import pandas as pd
import numpy as np

In [2]:
# reading the data
data = pd.read_csv('Customer_Lifetime_Value_Dataset.csv')

# printing the head of the data
data.head()

Unnamed: 0,customerid,firstpurchasedate,lastpurchasedate,totalpurchases,totalrevenue,referralsource,churnindicator,discountsused,productcategory,responsetolastcampaign,feedbackscore,preferredpaymentmethod,supportticketsraised,hasloyaltycard,frequency
0,8519,2021-12-31,2022-03-06,7,11670,Online advertisements,0,2,Q02,ignored,4.729998,debit card,0,no,7
1,38152,2019-09-27,2023-02-02,20,5260,Traditional media outreach,1,6,F76,purchased,4.184512,cash,0,no,2
2,19680,2021-06-13,2022-02-04,29,9790,Influencer endorsements,0,2,X04,opened mail,4.34664,google pay,0,no,4
3,35744,2021-07-28,2022-08-21,15,9591,Influencer endorsements,0,5,A25,ignored,5.0,debit card,0,no,13
4,11663,2021-01-19,2022-03-10,13,10134,Word of mouth,0,3,A16,ignored,4.482089,credit card,0,no,11


In [3]:
from sklearn.preprocessing import LabelEncoder

# Select the columns to be label encoded
columns_to_encode = ['hasloyaltycard']

# Create a LabelEncoder instance
label_encoder = LabelEncoder()

# Create a new DataFrame 'encoded_data' to store the label-encoded data
encoded_data = data.copy()

# Iterate over the selected columns and apply label encoding
for column in columns_to_encode:
    # Fit and transform the column
    encoded_data[column] = label_encoder.fit_transform(encoded_data[column])

encoded_data.head()

Unnamed: 0,customerid,firstpurchasedate,lastpurchasedate,totalpurchases,totalrevenue,referralsource,churnindicator,discountsused,productcategory,responsetolastcampaign,feedbackscore,preferredpaymentmethod,supportticketsraised,hasloyaltycard,frequency
0,8519,2021-12-31,2022-03-06,7,11670,Online advertisements,0,2,Q02,ignored,4.729998,debit card,0,0,7
1,38152,2019-09-27,2023-02-02,20,5260,Traditional media outreach,1,6,F76,purchased,4.184512,cash,0,0,2
2,19680,2021-06-13,2022-02-04,29,9790,Influencer endorsements,0,2,X04,opened mail,4.34664,google pay,0,0,4
3,35744,2021-07-28,2022-08-21,15,9591,Influencer endorsements,0,5,A25,ignored,5.0,debit card,0,0,13
4,11663,2021-01-19,2022-03-10,13,10134,Word of mouth,0,3,A16,ignored,4.482089,credit card,0,0,11


In [4]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

target_column = 'hasloyaltycard'
features = ['feedbackscore', 'totalrevenue', 'totalpurchases', 'frequency', 'churnindicator']

# Create a new DataFrame with only the selected columns
new_data = encoded_data[features]

# Dropping non-numeric columns
X = new_data
y = encoded_data[target_column]

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the RandomForestRegressor
rfc = RandomForestClassifier(n_estimators=100, random_state=42)
rfc.fit(X_train, y_train)

# Predict the target values
y_pred = rfc.predict(X_test)

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, r2_score

# Assuming y_true contains the true labels and y_pred contains the predicted labels
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)

from sklearn.metrics import r2_score
print("r^2:", r2_score(y_test, y_pred))

Accuracy: 0.9707513867876955
Precision: 0.8713235294117647
Recall: 0.9115384615384615
F1-score: 0.8909774436090225
r^2: 0.7432608598598152


In [5]:
new_data.columns

Index(['feedbackscore', 'totalrevenue', 'totalpurchases', 'frequency',
       'churnindicator'],
      dtype='object')

In [6]:
import joblib
 
# Serialize (saving) the trained model to a file
joblib.dump(rfc, 'random_forest_classifier.pkl')

['random_forest_classifier.pkl']