# 1.Introduction

## 1.1 Business Problem

An e-commerce company aims to separate its customers into different segments and implement a market strategy according to those segments.

In order to achieve this goal, purchasing actions of different customers will be clarified and customers will be grouped according to those actions

__Introduction to the Dataset__

https://archive.ics.uci.edu/ml/datasets/Online+Retail+II

The Online Retail II dataset contains sales information for a UK-based location that includes sales between 01/12/2009 and 09/12/2011.

This company sells souvenirs. It can think like promotional products.

Most of their customers are wholesalers.

__Variables__

- __InvoiceNo:__ Invoice number. Unique number for each transaction, namely invoice. If this code starts with C, it indicates that the transaction has been canceled.
- __StockCode:__ Product code. Unique number for each product.
- __Description:__ Name of the product
- __Quantity:__ Quantity of the product. It defines how many products in the invoices are sold.
- __InvoiceDate:__ Invoice date and time.
- __UnitPrice:__ Ürün fiyatı (Sterlin cinsinden)
- __CustomerID:__ Unique customer number
- __Country:__ Country name. The country where the customer lives.

- P.S: Excel Datasheet between the years 2010-2011 were used in the analysis.

## 1.2 Recency, Frequency and Monetary Analysis(RFM)

RFM is a method used for analyzing customer value. It is commonly used in database marketing and direct marketing and has received particular attention in retail and professional services industries.

* RFM stands for the three dimensions:

    __Recency__ – How recently did the customer purchase?
    
    __Frequency__ – How often do they purchase?
    
    __Monetary__ – How much do they spend?
    
‘RFM (Market Research)’ (2020) Wikipedia. Available at: https://en.wikipedia.org/wiki/RFM_(market_research)

## 1.3 Import Libraries

In [None]:
# Data manipulation and linear algebra
import pandas as pd
# Date
import datetime as dt
# Settings
import warnings

## 1.4 Read Data

In [None]:
df = pd.read_csv('../input/online-retail-ii-uci/online_retail_II.csv')

# 2. Overview

In [None]:
df.info()

In [None]:
df.head()

# 3. Data Preprocessing

In [None]:
# Removing returned products (Invoice numbers starting with C) from the data set
df = df[~df["Invoice"].str.contains("C", na = False)]
# Removing missing values from the dataset
df.dropna(inplace = True)

# 4. RFM Analysis

## 4.1 Recency

In order to find the recency value of each customer, we need to determine the last invoice date as the current date and subtract the last purchasing date of each customer from this date.

In [None]:
df["InvoiceDate"].max() # Last invoice date

In [None]:
today_date = dt.datetime(2011,12,9) # last invoice date is assigned to today_date variable

In [None]:
# The type of Customer ID variable needs to be turned into an integer for following commands.
df["Customer ID"] = df["Customer ID"].astype(int) 

In [None]:
# The type of InvoiceDate variable needs to be turned into datetime for following commands.
df["InvoiceDate"] = pd.to_datetime(df["InvoiceDate"])

In [None]:
# Grouping the last invoice dates according to the Customer ID variable, subtracting them from today_date, and assigning them as recency
recency = (today_date - df.groupby("Customer ID").agg({"InvoiceDate":"max"}))
# Rename column name as Recency
recency.rename(columns = {"InvoiceDate":"Recency"}, inplace = True)
# Change the values to day format
recency_df = recency["Recency"].apply(lambda x: x.days)
recency_df.head()


## 4.2 Frequency

In order to find the frequency value of each customer, we need to determine how many times the customers make purchases.

In [None]:
# Grouping unique values of invoice date according to customer_id variable and assigning them to freq_df variable
freq_df = df.groupby("Customer ID").agg({"InvoiceDate":"nunique"}) 
# Rename column name as Frequency
freq_df.rename(columns={"InvoiceDate": "Frequency"}, inplace=True)
freq_df.head()

## 4.3 Monetary

In order to find the monetary value of each customer, we need to determine how much do the customers spend on purchases

In [None]:
# Multiplying the prices and quantities of purchased products and assigning them to the total price variable
df["TotalPrice"] = df["Quantity"] * df["Price"]

In [None]:
# Grouping and sum up total prices according to each Customer ID
monetary_df = df.groupby("Customer ID").agg({"TotalPrice":"sum"})
# Rename Total Price column as Monetary
monetary_df.rename(columns={"TotalPrice":"Monetary"}, inplace=True)
monetary_df.head()

## 4.4 Concatenate Recency,Frequency and Monetary

In [None]:
rfm = pd.concat([recency_df, freq_df, monetary_df],  axis=1)
rfm.head()

# 5. Scoring of Recency, Frequency and Monetary Values

In [None]:
# Dividing the recency values into recency scores such that the lowest recency value as 5 and the highest as 1
rfm["RecencyScore"] = pd.qcut(rfm["Recency"], 5, labels = [5, 4 , 3, 2, 1]) 
# Dividing the frequency values into frequency scores such that the lowest frequency value as 1 and the highest as 5
rfm["FrequencyScore"]= pd.qcut(rfm["Frequency"].rank(method="first"),5, labels=[1,2,3,4,5])
# Dividing the monetary values into monetary scores such that the lowest monetary value as 1 and the highest as 5
rfm["MonetaryScore"] = pd.qcut(rfm['Monetary'], 5, labels = [1, 2, 3, 4, 5])


In [None]:
# Combining Recency, Frequency, and Monetary Scores in a string format
rfm["RFM_SCORE"] = (rfm['RecencyScore'].astype(str) + 
                    rfm['FrequencyScore'].astype(str) + 
                    rfm['MonetaryScore'].astype(str))

In [None]:
# Customers with best scores
rfm[rfm["RFM_SCORE"]=="555"].head()

In [None]:
# Customers with worst scores
rfm[rfm["RFM_SCORE"]=="111"].head()

# 6. Customer Segmentation

In [None]:
# Mapping of segments according to recency and frequency scores of customers
seg_map = {
    r'[1-2][1-2]': 'Hibernating',
    r'[1-2][3-4]': 'At Risk',
    r'[1-2]5': 'Can\'t Loose',
    r'3[1-2]': 'About to Sleep',
    r'33': 'Need Attention',
    r'[3-4][4-5]': 'Loyal Customers',
    r'41': 'Promising',
    r'51': 'New Customers',
    r'[4-5][2-3]': 'Potential Loyalists',
    r'5[4-5]': 'Champions'
}

In [None]:
# Recency and Frequency scores are turned into string format, combined and assigned to Segment
rfm['Segment'] = rfm['RecencyScore'].astype(str) + rfm['FrequencyScore'].astype(str)
# Segments are changed with the definitons of seg_map
rfm['Segment'] = rfm['Segment'].replace(seg_map, regex=True)

In [None]:
rfm.head()

In [None]:
# Mean, median, count statistics of different segments
rfm[["Segment","Recency","Frequency", "Monetary"]].groupby("Segment").agg(["mean","median","count"])

Several marketing strategies can be determined for different customer segments. I have determined 3 strategies for different customer segments. These can be diversified and customers can be monitored more closely.




__At Risk__

Those in this group last shopping an average of 371 days ago. The group median was 375.0, so there was not much deviation from the mean. Therefore, it can be said that this number is consistent throughout the group. On average, 3.89 units of shopping were made and 1379.64 units of payments were made. The time interval that has passed since the last purchase of this group is very high, so customers may be lost. The reasons that may cause these people not to shop for so long should be focused on. There may be a case of customer dissatisfaction. The shopping experience of the customer can be examined by sending a survey via mail. If there is no dissatisfaction, then the person is reminded. Options such as discount codes may be offered to encourage re-shopping.

__Need Attention__


People in this group last shopping, on average, 112 days ago. The group median is 105, so there is not much deviation from the mean. Hence, this number is consistent across the group. On average, 3.14 units of shopping were made and 1276.34 units of payment were made. This group is less risky than the At-Risk group. The last shopping date is relatively close. Special offers can be made from products whose consumption is faster than among the products that those customers shop. By doing this, the average visit time of customers can be shortened.

__Potential Loyalists__

Those in this group last shopping an average of 24 days ago. The group median is 22, so there is not much deviation from the mean. Hence, this number is consistent across the group. On average, 2.58 units were purchased and 1158.27 units were paid. People in this group can be included in the Loyal Customer group if supported. Therefore, they can be monitored closely and customer satisfaction can be increased with one-to-one phone calls. Apart from this, options such as free shipping can be offered to increase the average paid wages.
