# RFM ANALYSIS FOR ONLİNE RETAIL DATA SET

## Data Set Information:

https://archive.ics.uci.edu/ml/datasets/Online+Retail+II

This Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.The company mainly sells unique all-occasion gift-ware. Many customers of the company are wholesalers.

## Attribute Information:

InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation.
StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.
Description: Product (item) name. Nominal.
Quantity: The quantities of each product (item) per transaction. Numeric.
InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated.
UnitPrice: Unit price. Numeric. Product price per unit in sterling (Â£).
CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.
Country: Country name. Nominal. The name of the country where a customer resides.

## Data Understanding

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns

In [None]:
pd.set_option('display.max_columns', None); pd.set_option('display.max_rows', None);

pd.set_option('display.float_format', lambda x: '%.0f' % x)
import matplotlib.pyplot as plt

In [None]:
df_2010_2011 = pd.read_excel("../input/online-retail-ii-data-set-from-ml-repository/online_retail_II.xlsx", sheet_name = "Year 2010-2011")

In [None]:
df = df_2010_2011.copy()

In [None]:
df.head()

In [None]:
# Unique products
df["Description"].nunique()

In [None]:
# Each products counts are..
df["Description"].value_counts().head()

In [None]:
# Best-seller
df.groupby("Description").agg({"Quantity":"sum"}).head()

In [None]:
df.groupby("Description").agg({"Quantity":"sum"}).sort_values("Quantity", ascending = False).head()

In [None]:
# Unique invoice
df["Invoice"].nunique()

In [None]:
df["TotalPrice"] = df["Quantity"]*df["Price"]

In [None]:
df.head()

In [None]:
# The top invoices for price
df.groupby("Invoice").agg({"TotalPrice":"sum"}).head()

In [None]:
# The most expensive product is "POSTAGE"
df.sort_values("Price", ascending = False).head()

In [None]:
df["Country"].value_counts().head()

Data Preparation

In [None]:
df.isnull().sum()

In [None]:
df.dropna(inplace = True)

In [None]:
df.shape

In [None]:
df.describe([0.01,0.05,0.10,0.25,0.50,0.75,0.90,0.95, 0.99]).T

In [None]:
for feature in ["Quantity","Price","TotalPrice"]:

    Q1 = df[feature].quantile(0.01)
    Q3 = df[feature].quantile(0.99)
    IQR = Q3-Q1
    upper = Q3 + 1.5*IQR
    lower = Q1 - 1.5*IQR

    if df[(df[feature] > upper) | (df[feature] < lower)].any(axis=None):
        print(feature,"yes")
        print(df[(df[feature] > upper) | (df[feature] < lower)].shape[0])
    else:
        print(feature, "no")

## RFM SCORES

In [None]:
df.head()

### Recency

In [None]:
df["InvoiceDate"].min()

In [None]:
df["InvoiceDate"].max()

In [None]:
import datetime as dt
today_date = dt.datetime(2011,12,9)

In [None]:
today_date

In [None]:
df.groupby("Customer ID").agg({"InvoiceDate":"max"}).head()

In [None]:
df["Customer ID"] = df["Customer ID"].astype(int)

In [None]:
(today_date - df.groupby("Customer ID").agg({"InvoiceDate":"max"})).head()

In [None]:
temp_df = (today_date - df.groupby("Customer ID").agg({"InvoiceDate":"max"}))

In [None]:
temp_df.rename(columns={"InvoiceDate": "Recency"}, inplace = True)

In [None]:
temp_df.head()

In [None]:
recency_df = temp_df["Recency"].apply(lambda x: x.days)

In [None]:
recency_df.head()

### Frequency

In [None]:
temp_df = df.groupby(["Customer ID","Invoice"]).agg({"Invoice":"count"})

In [None]:
temp_df.head()

In [None]:
temp_df.groupby("Customer ID").agg({"Invoice":"sum"}).head()

In [None]:
freq_df = temp_df.groupby("Customer ID").agg({"Invoice":"sum"})
freq_df.rename(columns={"Invoice": "Frequency"}, inplace = True)
freq_df.head()

### Monetary

In [None]:
monetary_df = df.groupby("Customer ID").agg({"TotalPrice":"sum"})

In [None]:
monetary_df.head()

In [None]:
monetary_df.rename(columns={"TotalPrice": "Monetary"}, inplace = True)

In [None]:
print(recency_df.shape,freq_df.shape,monetary_df.shape)

In [None]:
rfm = pd.concat([recency_df, freq_df, monetary_df],  axis=1)

In [None]:
rfm.head()

In [None]:
rfm["RecencyScore"] = pd.qcut(rfm['Recency'], 5, labels = [5, 4, 3, 2, 1])

In [None]:
rfm["FrequencyScore"] = pd.qcut(rfm['Frequency'], 5, labels = [1, 2, 3, 4, 5])

In [None]:
rfm["MonetaryScore"] = pd.qcut(rfm['Monetary'], 5, labels = [1, 2, 3, 4, 5])

In [None]:
rfm.head()

In [None]:
(rfm['RecencyScore'].astype(str) + 
 rfm['FrequencyScore'].astype(str) + 
 rfm['MonetaryScore'].astype(str)).head()

In [None]:
rfm["RFM_SCORE"] = rfm['RecencyScore'].astype(str) + rfm['FrequencyScore'].astype(str) + rfm['MonetaryScore'].astype(str)

In [None]:
rfm.head()

In [None]:
rfm.describe().T

In [None]:
rfm[rfm["RFM_SCORE"] == "555"].head()

In [None]:
rfm[rfm["RFM_SCORE"] == "111"].head()

In [None]:
seg_map = {
    r'[1-2][1-2]': 'Hibernating',
    r'[1-2][3-4]': 'At Risk',
    r'[1-2]5': 'Can\'t Loose',
    r'3[1-2]': 'About to Sleep',
    r'33': 'Need Attention',
    r'[3-4][4-5]': 'Loyal Customers',
    r'41': 'Promising',
    r'51': 'New Customers',
    r'[4-5][2-3]': 'Potential Loyalists',
    r'5[4-5]': 'Champions'
}

In [None]:
rfm['Segment'] = rfm['RecencyScore'].astype(str) + rfm['FrequencyScore'].astype(str)
rfm['Segment'] = rfm['Segment'].replace(seg_map, regex=True)
rfm.head()

In [None]:
rfm[["Segment", "Recency","Frequency","Monetary"]].groupby("Segment").agg(["mean","count"])

## Evaluation and Action Proposal

Within this table, information about 3 customer groups were analyzed and actions related to these analyzes were determined.

1. Cant Loose Them: This group consists of 81 customers. The Recency average is 141, the Frequency average is 184, and the Monetary average is 2346.

2. Horse Risk: This group consists of 577 customers. Recency averages are 164, Frequency averages are 57, and Monetary averages are 950.

3. Need Attention: This group consists of 208 customers. Recency averages are 50, Frequency averages are 42, Monetary averages are 833.

Yorumlar

The most risky and potentially focused customers on this table are the "Can't Loose Them" class. This class is the most valuable customer class in this table. Recency value is low, but it is the highest when looking at Frequence. In other words, this customer is the class with the highest potential to leave us. Considering all customer classes, Monetary is one of the customers with the highest value, but this group of customers has not recently exchanged. He approached leaving our company. This is the most important customer class for our company.

A class that should not be lost is the "Horse Risk" group of customers. This group is also one of the highest classes in Frequency. It also attracts attention as it is the third-class customer group that is the most customer. Recency value of this group is low. This is an indication that this group has not been shopping recently. It is necessary to bring these customers back to the company.

In the "Need Attention" class, frequency and recency values ​​are an average customer group. This customer group is also a class that needs attention because it has the potential to shift to both regions. As an action suggestion, if supported by promotions, discounts and loyalty to the company, this customer group can be shifted towards loyal customers or potential loyal customers. However, as a result of such action attempts, they may also shift to the "horse risk" group or they may also be included in the "about to sleep" class.

As actions to be taken for these groups, their loyalty should be increased through promotions, discounts and special team campaigns. Special activities should be carried out for these customer groups. Actions to be taken for these groups will provide higher efficiency compared to other customer classes. For this reason, information about these 3 customer groups should be extracted and this customer information should be shared with the business intelligence department. These departments, which have a business intelligence department, should be directed to departments such as purchasing and marketing by removing the contact information of these customers and they should mobilize these customers regarding their own business areas.

Below, information about the "Need Attention" customer group has been extracted as "new_customers.csv" file. This file is prepared to be sent to the business intelligence department.

In [None]:
rfm[rfm["Segment"] == "Need Attention"].head()

In [None]:
rfm[rfm["Segment"] == "New Customers"].index

In [None]:
new_df = pd.DataFrame()
new_df["NewCustomerID"] = rfm[rfm["Segment"] == "New Customers"].index

In [None]:
new_df.head()

In [None]:
new_df.to_csv("new_customers.csv")