<a href="https://colab.research.google.com/github/marcorivera24/E-Commerce-Customer-Growth-Retention-Strategy/blob/main/E_Commerce_Customer_Growth_%26_Retention_Strategy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### E-Commerce Customer Growth & Retention Strategy

E-Commerce Customer Analytics & Growth Strategy

In competitive e-commerce markets, customer acquisition costs are rising while retention has become a key driver of profitability. Companies must leverage data to understand purchasing behavior, identify high-value customers, and proactively reduce churn.

This project analyzes large-scale transactional data from an online retail company to extract actionable business insights and build predictive models that support customer growth strategies. Using customer segmentation techniques (RFM analysis), behavioral analytics, and machine learning models, the project identifies high-value segments, at-risk customers, and revenue optimization opportunities.

The objectives of this project are to:

Analyze customer purchasing behavior using exploratory data analysis

Segment customers based on Recency, Frequency, and Monetary value

Identify high-value and at-risk customer groups

Develop predictive models to estimate churn probability

Translate analytical findings into operational business recommendations

This project demonstrates the ability to transform raw transactional data into strategic insights that directly impact revenue growth, retention strategy, and marketing efficiency.

Dataset Overview

This project uses a real-world transactional dataset from an online retail company. The dataset contains detailed information about customer purchases over a specific time period.

Each row represents a product purchased within a specific transaction.

The dataset includes information such as:

Transaction ID (Invoice number)

Product identifier and description

Quantity purchased

Unit price

Transaction date and time

Customer ID

Customer country

Overall, the dataset contains hundreds of thousands of transactions, allowing for a comprehensive analysis of purchasing behavior and customer value.

##Import the Online Retail dataset from Excel and perform an initial data exploration.
The commented lines can be used to inspect the dataset structure (head, info, summary statistics).
Finally, we check for missing values to assess data completeness and identify potential cleaning steps.

In [4]:
import pandas as pd

df = pd.read_excel("Online Retail.xlsx")

'''print(df.head())
print(df.info())
print(df.describe())'''
print(df.isnull().sum())

InvoiceNo           0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
UnitPrice           0
CustomerID     135080
Country             0
dtype: int64


The dataset contains approximately 135,000 records with missing CustomerID values. Since the objective of the project is to perform customer-level analysis, these records were removed to ensure accurate segmentation and behavioral modeling. A small number of missing product descriptions were also identified and handled appropriately, as they do not significantly impact revenue calculations.

In [5]:
df = df.dropna(subset=["CustomerID"])

To ensure accurate behavioral analysis, cancelled invoices and product returns were removed from the dataset, as they do not represent completed purchases. Additionally, a Revenue variable was created by multiplying Quantity by UnitPrice to enable customer value and monetary analysis.

In [6]:
# Remove cancelled invoices
df = df[~df["InvoiceNo"].astype(str).str.startswith("C")]

# Remove returns
df = df[df["Quantity"] > 0]

# Create revenue column
df["Revenue"] = df["Quantity"] * df["UnitPrice"]

## Build RFM Analysis
To analyze customer behavior, an RFM (Recency, Frequency, Monetary) framework was implemented. The dataset was aggregated at the customer level to calculate the time since last purchase, number of unique transactions, and total revenue generated per customer.

In [7]:
df["InvoiceDate"] = pd.to_datetime(df["InvoiceDate"])


In [8]:
reference_date = df["InvoiceDate"].max() + pd.Timedelta(days=1)


In [9]:
#Create RFM Table
rfm = df.groupby("CustomerID").agg({
    "InvoiceDate": lambda x: (reference_date - x.max()).days,
    "InvoiceNo": "nunique",
    "Revenue": "sum"
})

rfm.columns = ["Recency", "Frequency", "Monetary"]


In [10]:
print(rfm.head())


            Recency  Frequency  Monetary
CustomerID                              
12346.0         326          1  77183.60
12347.0           2          7   4310.00
12348.0          75          4   1797.24
12349.0          19          1   1757.55
12350.0         310          1    334.40


## Create RFM Scores
Customers were segmented using an RFM scoring framework. Each customer was assigned a score from 1 to 5 for Recency, Frequency, and Monetary value. Based on these scores, customers were classified into segments such as VIP, Loyal, At Risk, and Regular, enabling targeted retention and marketing strategies.

In [12]:
#Score Recency (reverse scoring)
rfm["R_Score"] = pd.qcut(rfm["Recency"], 5, labels=[5,4,3,2,1])


In [13]:
#Score Frequency
rfm["F_Score"] = pd.qcut(rfm["Frequency"].rank(method="first"), 5, labels=[1,2,3,4,5])


In [14]:
#Score Monetary
rfm["M_Score"] = pd.qcut(rfm["Monetary"], 5, labels=[1,2,3,4,5])


In [15]:
#Create Combined RFM Score
rfm["RFM_Score"] = (
    rfm["R_Score"].astype(str) +
    rfm["F_Score"].astype(str) +
    rfm["M_Score"].astype(str)
)

In [17]:
def segment_customer(row):
    if row["R_Score"] >= 4 and row["F_Score"] >= 4 and row["M_Score"] >= 4:
        return "VIP"
    elif row["R_Score"] >= 3 and row["F_Score"] >= 3:
        return "Loyal"
    elif row["R_Score"] <= 2 and row["F_Score"] >= 3:
        return "At Risk"
    else:
        return "Regular"

rfm["Segment"] = rfm.apply(segment_customer, axis=1)


In [20]:
print(rfm["Segment"].value_counts())


Segment
Regular    1736
Loyal       998
VIP         962
At Risk     643
Name: count, dtype: int64


In [19]:
print(rfm.groupby("Segment")[["Recency","Frequency","Monetary"]].mean())


            Recency  Frequency     Monetary
Segment                                    
At Risk  152.844479   3.405910  1244.994636
Loyal     34.091182   3.714429  1477.081714
Regular  147.904378   1.139401   476.617357
VIP       12.861746  11.082121  6038.816081


In [22]:
segment_revenue = rfm.groupby("Segment")["Monetary"].sum().sort_values(ascending=False)
print(segment_revenue)


Segment
VIP        5809341.070
Loyal      1474127.551
Regular     827407.732
At Risk     800531.551
Name: Monetary, dtype: float64


VIP customers generate approximately 5.8 million in revenue, representing the largest share by a significant margin.

Despite not being the largest group in terms of customer count, VIP customers contribute the majority of total revenue. This indicates a strong revenue concentration within a high-value segment.

This suggests the business heavily depends on retaining and nurturing these customers.


Loyal Customers Represent Growth Potential

Loyal customers generate approximately 1.47 million in revenue.

They demonstrate consistent purchasing behavior but lower spending compared to VIP customers. With targeted upselling and loyalty incentives, this segment could potentially transition into VIP status.

At-Risk Customers Represent Recoverable Revenue

At-risk customers account for approximately 800,000 in revenue.

Although they have not purchased recently, they previously demonstrated moderate purchasing behavior. Losing this segment could result in significant revenue loss.

This group represents a high-priority opportunity for re-engagement campaigns.

At-Risk Customers Represent Recoverable Revenue

At-risk customers account for approximately 800,000 in revenue.

Although they have not purchased recently, they previously demonstrated moderate purchasing behavior. Losing this segment could result in significant revenue loss.

This group represents a high-priority opportunity for re-engagement campaigns.

This segmentation analysis demonstrates how transactional data can be transformed into actionable strategic insights. By identifying revenue-driving segments and at-risk customers, the company can implement targeted retention and growth strategies to maximize customer lifetime value and long-term profitability.