## 1. Domain Introduction: 
### Customer Segmentation:
Customer segmentation is the practice of dividing a customer base into groups of individuals that are similar in specific ways relevant to marketing, such as age, gender, interests and spending habits.

Customer segmentation relies on identifying key differentiators that divide customers into groups that can be targeted. Information such as a customer's: 

- Demographics (age, race, religion, gender, family size, ethnicity, income, education level)
- Geography (where they live and work)
- Psychographic (social class, lifestyle and personality characteristics) 
- Behavioral (spending, consumption, usage and desired benefits) tendencies 
are taken into account when determining customer segmentation practices.

In this Data Tale we will focus on behavioral Tendencies in order to group Customers into the following categories:
- Loyal Customers
- Best Customers
- Big Spenders Customers
- High Breadth Customers(those who buy a large range of products)
- Almost Lost Customers
- Lost Customers
- Lost Cheap Customers


#### RFM Metrics:
These stand for Recency,Frequency and Monetary metrics and they are widely used by organisations for behavioral segementation of their customers.

#### Recency
- How long ago was last purchase? (in no of days)
- Measured for “As Of Date” of data set which is taken as max(Invoice_Date) in our dataset

#### Frequency
- How many orders in analysis period?
- Attempting to measure engagement

#### Monetary
- What is total monetary value of all orders in analysis period?

Now I have also introduced Breadth as a metric in our analysis which is defined below:-

#### Breadth
- How many different kinds of products were purchased by the customer?

Hence our metrics can be termed as **RFMB Metrics**


In [1]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
from datetime import timedelta

### Dataset for Online Retail has been imported from kaggle using my username and key

1.   List item
2.   List item



In [2]:
#creating a directory
!mkdir ~/.kaggle
#using credentials to open kaggle dataset
import json
token = {"username":"enter your username","key":"enter your key"}
with open('/root/.kaggle/kaggle.json', 'w') as file:
    json.dump(token, file)
!cp /root/.kaggle/kaggle.json ~/.kaggle/kaggle.json
!kaggle config set -n path -v{/content}
!chmod 600 /root/.kaggle/kaggle.json
#downloading the zipped dataset folder on cloud
!kaggle datasets download -d mrmining/online-retail -p /content    

cp: '/root/.kaggle/kaggle.json' and '/root/.kaggle/kaggle.json' are the same file
- path is now set to: {/content}
Downloading online-retail.zip to /content
 78% 17.0M/21.9M [00:00<00:00, 31.5MB/s]
100% 21.9M/21.9M [00:00<00:00, 37.2MB/s]


In [3]:
#unzipping the folder
!unzip \*.zip

Archive:  online-retail.zip
  inflating: Online Retail.xlsx      


**prep_for_rfmb function** 
<br>
A function intended to preprocess the data which is basically to remove null customer ids and take into account only the positive values of quantity and unit price

In [4]:
#A function intended to preprocess the data which is basically to remove null customer ids and take into account only the positive values of quantity
# and unit price
def prep_for_rfmb(dataframe,cust_id='CustomerID', quantity='Quantity',price='UnitPrice', total_price = 'TotalPrice'):
    dataframe = dataframe[pd.notnull(dataframe[cust_id])]
    for var in [quantity,price]:
        dataframe = dataframe[(dataframe[var]>0)]
    dataframe[total_price] = dataframe[quantity] * dataframe[price]
    return dataframe

In [5]:
#Here Invoicedate is coverted to a suitable datatype so as to perform computations
def create_rfmb_table(dataframe, cust_id='CustomerID', invoice_date='InvoiceDate', total_price = 'TotalPrice'):
    NOW = pd.to_datetime(dataframe[invoice_date].max()) + timedelta(days=1)
    dataframe[invoice_date] = pd.to_datetime(dataframe[invoice_date])
    #Creating an rfmb Table
    rfmbTable = dataframe.groupby(cust_id).agg(
    {
        'InvoiceDate': lambda x: (NOW - x.max()).days, #how many no of days before the current date the product was bought
        'StockCode': lambda x: x.count() ,  #the range of products customer buys
        'InvoiceNo': lambda x: len(x), total_price: lambda x: x.sum()  #how frequently the product is bought in a time period
     }
    )
    rfmbTable['InvoiceDate'] = rfmbTable['InvoiceDate'].astype(int)
    rfmbTable.rename(
    columns={
        'InvoiceDate': 'recency',  
        'InvoiceNo': 'frequency', 
        'TotalPrice': 'monetary_value',#Total Price is obviously going to be the monetary value
        'StockCode': 'breadth'},
    inplace=True
    )
    return rfmbTable

## Creating Quantiles of Recency, Frequency, Monetary and Breadth values for all customers

### **R Metrics**
#### The lower value of recency metric implies that the order was recently placed by a customer and less the recency the more better it is. Hence lower values of recency are placed in 1st quartile and higher in subsequent below quartiles. 

### **FMB Metrics**
#### The higher values of frequency,breadth and monetary metrics implies that the orders are frequently placed by customer or we attain more monetary benefits or our customer buys more range of products and hence more the FMB metrics, the more better it is. Therefore higher values of monetary,breadth and frequency metrics are placed in 1st quartile and lower ones in subsequent below quartiles.

## Defining Segments of Customers on the basis of RFM Metrics

### **Best Customers**
#### The best ones are in the first quartile in all R,F,M and B metrics

### **Loyal Customers**
#### The loyal customers are those who buy from us frequently or in first quartile of frequency metrics

### **Big Spender Customers**
#### The big spenders are those who are in first quartile of monetary metrics

### **High Breadth Customers**
#### The customers who buy a large range of products.

### **Almost Lost Customers**
#### The almost lost ones are in the third quartile of recency category

### **Lost Customers**
#### The lost ones are in the fourth quartile of recency category who used to buy frequently from us and we had monetary benefits from them

### **Lost Cheap Customers**
#### They are in the fourth quartile of all categories



In [6]:
def segment_rfmb_table(rfmb_table):
    
    segments = {}
    quantiles = rfmb_table.quantile(q=[0.25,0.5,0.75])
    quantiles = quantiles.to_dict()
    #so the less value of recency metric implies that the order was recently placed by a customer and less the recency the more better it is.
    # Hence lower values of recency are placed in 1st quartile and higher in subsequent below quartiles.  
    def RScore(x,p,d):
        if x <= d[p][0.25]:
            return 1
        elif x <= d[p][0.50]:
            return 2
        elif x <= d[p][0.75]: 
            return 3
        else:
            return 4
    #so the higher values of frequency,breadth and monetary metrics implies that the orders are frequently placed by customer or we attain more monetary benefits 
    # from the customer and hence more the FMB metrics, the more better it is.
    # Hence higher values of monetary,breadth and frequency metrics are placed in 1st quartile and lower ones in subsequent below quartiles.          
    def FMBScore(x,p,d):
        if x <= d[p][0.25]:
            return 4
        elif x <= d[p][0.50]:
            return 3
        elif x <= d[p][0.75]: 
            return 2
        else:
            return 1
    
    segmented_rfmb = rfmb_table 
    #Recency, Frequency and Monetary quartiles for each customer
    segmented_rfmb['r_quartile'] = segmented_rfmb['recency'].apply(RScore, args=('recency',quantiles,))
    segmented_rfmb['f_quartile'] = segmented_rfmb['frequency'].apply(FMBScore, args=('frequency',quantiles,))
    segmented_rfmb['m_quartile'] = segmented_rfmb['monetary_value'].apply(FMBScore, args=('monetary_value',quantiles,))
    segmented_rfmb['b_quartile'] = segmented_rfmb['breadth'].apply(FMBScore, args=('breadth',quantiles,))
    #rfmb Score is just appending the above strings in order 
    segmented_rfmb['rfmbScore'] = segmented_rfmb.r_quartile.map(str) + segmented_rfmb.f_quartile.map(str) + segmented_rfmb.m_quartile.map(str)+ segmented_rfmb.b_quartile.map(str)
    
    def create_segment(segmented_rfmb,r_quartile=1,f_quartile=1,m_quartile=1):
        step1 = segmented_rfmb[segmented_rfmb['r_quartile']==r_quartile].sort_values('monetary_value', ascending=False)
        step2 = step1[step1['f_quartile']==f_quartile].sort_values('monetary_value', ascending=False)
        step3 = step2[step2['m_quartile']==m_quartile].sort_values('monetary_value', ascending=False) 
        return step3
    #Here we have created segments for our best, almost_lost, lost, lost_cheap, loyal and big spender customers
    #The best ones are in the first quartile in all categories
    segments['best'] = create_segment(segmented_rfmb=segmented_rfmb,r_quartile=1,f_quartile=1,m_quartile=1)
    #The almost lost ones are in the third quartile of recency category
    segments['almost_lost'] = create_segment(segmented_rfmb=segmented_rfmb,r_quartile=3,f_quartile=1,m_quartile=1)
    #The lost ones are in the fourth quartile of recency category who used to buy frequently from us and we had monetary benefits 
    segments['lost'] = create_segment(segmented_rfmb=segmented_rfmb,r_quartile=4,f_quartile=1,m_quartile=1)
    #The lost_cheap ones are those whom we have lost but they were neither frequent nor gave us monetary benefits
    # They are in the fourth quartile of all categories
    segments['lost_cheap'] = create_segment(segmented_rfmb=segmented_rfmb,r_quartile=4,f_quartile=4,m_quartile=4)
    #The loyal customers are those who buy from us frequently
    segments['loyal'] = segmented_rfmb[segmented_rfmb['f_quartile']==1].sort_values('monetary_value', ascending=False)
    #The big spenders are those who are in first quartile of monetary
    segments['big_spender'] = segmented_rfmb[segmented_rfmb['m_quartile']==1].sort_values('monetary_value', ascending=False)
    #The customers who buy a large range of products
    segments['ranged'] = segmented_rfmb[segmented_rfmb['b_quartile']==1].sort_values('monetary_value', ascending=False)
  

    return segments

In [7]:
def main(raw_data):
    df_clean = prep_for_rfmb(dataframe=raw_data)
    rfmb_table = create_rfmb_table(dataframe=df_clean)
    print(rfmb_table)
    segmented_rfmb = segment_rfmb_table(rfmb_table=rfmb_table)
    return segmented_rfmb

In [8]:
#opening the dataframe
df = pd.read_excel('Online Retail.xlsx')
df.head()

Unnamed: 0,InvoiceNo,StockCode,lower,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,white hanging heart t-light holder,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,white metal lantern,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,cream cupid hearts coat hanger,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,knitted union flag hot water bottle,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,red woolly hottie white heart.,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [12]:
if __name__=="__main__":
    
    segmented_rfmb = main(raw_data=df)
    
    print("""
    **************************************
    RFMB MARKETING STRATEGY
    **************************************
    
    The following strategies are recommended for rfmb:
    
    BEST CUSTOMERS: no price incentives, new products, and loyalty programs
    LOYAL CUSTOMERS: Use frequency and monetary metrics to segment further
    BIG SPENDERS: Market the most expensive products
    ALMOST LOST & LOST: Aggresive price incentives
    LOST CHEAP CUSTOMERS: Don't spend too many resources trying to acquire
    
    
    
    """)

            recency  breadth  frequency  monetary_value
CustomerID                                             
12346.0         326        1          1        77183.60
12347.0           2      182        182         4310.00
12348.0          75       31         31         1797.24
12349.0          19       73         73         1757.55
12350.0         310       17         17          334.40
...             ...      ...        ...             ...
18280.0         278       10         10          180.60
18281.0         181        7          7           80.82
18282.0           8       12         12          178.05
18283.0           4      756        756         2094.88
18287.0          43       70         70         1837.28

[4338 rows x 4 columns]

    **************************************
    RFMB MARKETING STRATEGY
    **************************************
    
    The following strategies are recommended for rfmb:
    
    BEST CUSTOMERS: no price incentives, new products, and loyalty 

**Segmented_RFMB** displays the recency, frequency, monetary and breadth metrics for all customers segmented as **loyal,best,lost,almost_lost,lost_cheap,higher breadth and big spenders** along with the **quartiles** for above metrics

In [13]:
segmented_rfmb

{'almost_lost':             recency  breadth  frequency  ...  m_quartile  b_quartile  rfmbScore
 CustomerID                               ...                                   
 12744.0          52      222        222  ...           1           1       3111
 12409.0          79      109        109  ...           1           1       3111
 16180.0         100      162        162  ...           1           1       3111
 14952.0          60      138        138  ...           1           1       3111
 16745.0          87      357        357  ...           1           1       3111
 ...             ...      ...        ...  ...         ...         ...        ...
 15220.0          52      117        117  ...           1           1       3111
 15241.0          67      110        110  ...           1           1       3111
 12843.0          66      103        103  ...           1           1       3111
 15549.0          77      115        115  ...           1           1       3111
 12635.0     