<h1 align=center style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazirmatn" color="#0099cc">
ارزش‌گذاری مشتریان
</font>
</h1>

## Customer Segmentation Using RFM Analysis

As part of the marketing efforts, customer segmentation is performed based on their value to the company using a well-known method in customer behavior analysis called [RFM](Recency, Frequency, Monetary Value). This approach calculates three key metrics for each customer:

1. **Recency**: The number of days since the customer’s last purchase.
2. **Frequency**: The total number of purchases made by the customer.
3. **Monetary Value**: The total revenue generated from a particular customer.

These metrics are vital indicators of customer behavior, highlighting that customers who make recent purchases are more likely to buy again, frequent buyers are generally more satisfied, and customers contributing higher monetary value distinguish themselves from others.


In [None]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

In [66]:
df = pd.read_csv('preprocessed_sales.csv')
df

Unnamed: 0,InvoiceNumber,ProductCode,ProductName,Quantity,InvoiceDate,UnitPrice,CustomerId,Country
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.10,13085.0,United Kingdom
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom
...,...,...,...,...,...,...,...,...
400911,538171,22271,FELTCRAFT DOLL ROSIE,2,2010-12-09 20:01:00,2.95,17530.0,United Kingdom
400912,538171,22750,FELTCRAFT PRINCESS LOLA DOLL,1,2010-12-09 20:01:00,3.75,17530.0,United Kingdom
400913,538171,22751,FELTCRAFT PRINCESS OLIVIA DOLL,1,2010-12-09 20:01:00,3.75,17530.0,United Kingdom
400914,538171,20970,PINK FLORAL FELTCRAFT SHOULDER BAG,2,2010-12-09 20:01:00,3.75,17530.0,United Kingdom


## Recency Metric

To calculate the **Recency** metric for each customer CustomerId, the number of days since their last purchase needs to be determined. This is done by calculating the difference between the date of analysis and the last recorded purchase date in the dataset. It's assumed that the analysis is being done on the day following the most recent purchase in the dataset.


In [67]:
import datetime
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])
last_day = max(df['InvoiceDate'])+datetime.timedelta(days=1)
last_day

Timestamp('2010-12-10 20:01:00')

In [68]:
df_customer_segments = df.groupby(['CustomerId'],as_index=False)['InvoiceDate'].apply(lambda x:abs((last_day - max(x)).days))
df_customer_segments.rename(columns={'InvoiceDate':'Recency'},inplace=True)
df_customer_segments.sort_values(by=['CustomerId'],ascending=False)
df_customer_segments

Unnamed: 0,CustomerId,Recency
0,12346.0,165
1,12347.0,3
2,12348.0,74
3,12349.0,43
4,12351.0,11
...,...,...
4307,18283.0,18
4308,18284.0,67
4309,18285.0,296
4310,18286.0,112


## Frequency Metric

Next, we need to calculate how many **unique** purchases each customer has made. The result should be stored in a column named Frequency within the df_customer_segments dataframe.


In [69]:
df_copy=df.copy()
df_copy=df_copy.drop_duplicates(subset=['InvoiceNumber'])
a=df_copy.groupby(['CustomerId'],as_index=False)['InvoiceNumber'].apply(lambda x: x.count())

result = pd.merge(df_customer_segments, a, on="CustomerId")
result.rename(columns={'InvoiceNumber':'Frequency'},inplace=True)
df_customer_segments=result
df_customer_segments

Unnamed: 0,CustomerId,Recency,Frequency
0,12346.0,165,11
1,12347.0,3,2
2,12348.0,74,1
3,12349.0,43,3
4,12351.0,11,1
...,...,...,...
4307,18283.0,18,6
4308,18284.0,67,1
4309,18285.0,296,1
4310,18286.0,112,2


## Monetary Value Metric

For this metric, we should to calculate the total purchase amount for each customer.


In [70]:
df['total']=df['UnitPrice']*df['Quantity']
a=df.groupby(['CustomerId'])['total'].apply(lambda x:x.sum())

result = pd.merge(df_customer_segments, a, on="CustomerId")
result.rename(columns={'total':'MonetaryValue'},inplace=True)
df_customer_segments=result
df_customer_segments.head()

Unnamed: 0,CustomerId,Recency,Frequency,MonetaryValue
0,12346.0,165,11,372.86
1,12347.0,3,2,1323.32
2,12348.0,74,1,222.16
3,12349.0,43,3,2671.14
4,12351.0,11,1,300.93


## Grouping Each Metric

Now, you need to categorize each of the three metrics into 4 different groups based on the first, second, and third quartiles, as shown below:

| **Group Number** | **Condition** |
| :---: | :---: |
| 1 | `value <= Q1` |
| 2 | `Q1 < value <= Q2` |
| 3 | `Q2 < value <= Q3` |
| 4 | `Q3 < value` |

For each of the three metrics, perform this categorization and store the results in three additional columns: 
`F_quartile`, `R_quartile`, and `M_quartile`. 



In [72]:
QR1 = np.percentile(df_customer_segments['Recency'], 25 )
QR2 = np.percentile(df_customer_segments['Recency'], 50 )
QR3 = np.percentile(df_customer_segments['Recency'], 75 )

QF1 = np.percentile(df_customer_segments['Frequency'], 25 )
QF2 = np.percentile(df_customer_segments['Frequency'], 50 )
QF3 = np.percentile(df_customer_segments['Frequency'], 75 )


QM1 = np.percentile(df_customer_segments['MonetaryValue'], 25 )
QM2 = np.percentile(df_customer_segments['MonetaryValue'], 50 )
QM3 = np.percentile(df_customer_segments['MonetaryValue'], 75 )


for _, row in df_customer_segments.iterrows():
    value = row['Recency']

    if value<=QR1:
        df_customer_segments.loc[row.name, 'R_quartile'] = 1
    elif (value> QR1 and value <=QR2):
        df_customer_segments.loc[row.name, 'R_quartile'] = 2
    elif (value> QR2 and value <=QR3):
        df_customer_segments.loc[row.name, 'R_quartile'] = 3
    else:
        df_customer_segments.loc[row.name, 'R_quartile'] = 4
df_customer_segments

for _, row in df_customer_segments.iterrows():
    value = row['Frequency']
    if value<=QF1:
        df_customer_segments.loc[row.name, 'F_quartile'] = 1
    elif (value> QF1 and value <=QF2):
        df_customer_segments.loc[row.name, 'F_quartile'] = 2
    elif (value> QF2 and value <=QF3):
        df_customer_segments.loc[row.name, 'F_quartile'] = 3
    else:
        df_customer_segments.loc[row.name, 'F_quartile'] = 4
df_customer_segments

for _, row in df_customer_segments.iterrows():
    value = row['MonetaryValue']
    if value<=QM1:
        df_customer_segments.loc[row.name, 'M_quartile'] = 1
    elif (value> QM1 and value <=QM2):
        df_customer_segments.loc[row.name, 'M_quartile'] = 2
    elif (value> QM2 and value <=QM3):
        df_customer_segments.loc[row.name, 'M_quartile'] = 3
    else:
        df_customer_segments.loc[row.name, 'M_quartile'] = 4

df_customer_segments.head()

Unnamed: 0,CustomerId,Recency,Frequency,MonetaryValue,R_quartile,F_quartile,M_quartile
0,12346.0,165,11,372.86,4.0,4.0,2.0
1,12347.0,3,2,1323.32,1.0,2.0,3.0
2,12348.0,74,1,222.16,3.0,1.0,1.0
3,12349.0,43,3,2671.14,2.0,3.0,4.0
4,12351.0,11,1,300.93,1.0,1.0,1.0


In [73]:
df_customer_segments['R_quartile']=df_customer_segments['R_quartile'].astype(int).astype(str)
df_customer_segments['F_quartile']=df_customer_segments['F_quartile'].astype(int).astype(str)
df_customer_segments['M_quartile']=df_customer_segments['M_quartile'].astype(int).astype(str)
df_customer_segments.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4312 entries, 0 to 4311
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   CustomerId     4312 non-null   float64
 1   Recency        4312 non-null   int64  
 2   Frequency      4312 non-null   int64  
 3   MonetaryValue  4312 non-null   float64
 4   R_quartile     4312 non-null   object 
 5   F_quartile     4312 non-null   object 
 6   M_quartile     4312 non-null   object 
dtypes: float64(2), int64(2), object(3)
memory usage: 235.9+ KB


## RFM Metric

Now, you can calculate the final **RFM** value by combining the values of the columns 
`[R_quartile, F_quartile, M_quartile]` 
and store the result in a column named **RFM**. 
For example, if the values in these columns for a customer are 
`[1, 2, 3]`, 
then the final **RFM** value will be `123`.


In [74]:
df_customer_segments['RFM'] = df_customer_segments['R_quartile']+df_customer_segments['F_quartile']+df_customer_segments['M_quartile']
df_customer_segments.head()


Unnamed: 0,CustomerId,Recency,Frequency,MonetaryValue,R_quartile,F_quartile,M_quartile,RFM
0,12346.0,165,11,372.86,4,4,2,442
1,12347.0,3,2,1323.32,1,2,3,123
2,12348.0,74,1,222.16,3,1,1,311
3,12349.0,43,3,2671.14,2,3,4,234
4,12351.0,11,1,300.93,1,1,1,111


## Customer Segmentation

Now, based on the values in the **RFM** column, you should group the customers according to the table below and store their group name in a column called **Segment**. 

### Segmentation Table

| **Segment**          | **RFM** |
| :-------------------: | :-----: |
| Best                 | `144`   |
| AlmostLost           | `344`   |
| LostBigSpenders      | `444`   |
| LostCheap            | `441`   |
| Loyal                | `X4X`   |
| BigSpenders          | `XX4`   |

If a customer does not fit into any of the groups in the table, label their group as *Normal*.


In [96]:
def group_rfm(i):
    
        if i == '144':
            x='Best'
            return x
            exit()
        elif i=='344':
            x='AlmostLost'
            return x
            exit()
        elif i=='444':
            x='LostBigSpenders'
            return x
            exit()
        elif i=='441':
            x='LostCheap'
            return x
            exit()
        elif(i[1]=='4'):
            x='Loyal'
            return x
            exit()
        elif(i[2]=='4'):
            x='BigSpenders'
            return x  
            exit()  
        else: return 'Normal'    

df_customer_segments['Segment'] = df_customer_segments['RFM'].apply(lambda x: group_rfm(x))
df_customer_segments

Unnamed: 0,CustomerId,Recency,Frequency,MonetaryValue,R_quartile,F_quartile,M_quartile,RFM,Segment
0,12346.0,165,11,372.86,4,4,2,442,Loyal
1,12347.0,3,2,1323.32,1,2,3,123,Normal
2,12348.0,74,1,222.16,3,1,1,311,Normal
3,12349.0,43,3,2671.14,2,3,4,234,BigSpenders
4,12351.0,11,1,300.93,1,1,1,111,Normal
...,...,...,...,...,...,...,...,...,...
4307,18283.0,18,6,619.37,1,4,2,142,Loyal
4308,18284.0,67,1,461.68,3,1,2,312,Normal
4309,18285.0,296,1,427.00,4,1,2,412,Normal
4310,18286.0,112,2,1296.43,3,2,3,323,Normal
