# Full Customer Lifetime Value & RFM Analysis using Machine Learning
Business: M-Pesa
---
**Dataset:** 

Private

**Analysis:**

Create different cohorts depending on user's creation date and forecast future behaviour.

**Useful links:**
- Step to step guide: https://www.youtube.com/watch?v=s-32u6XdY7c
- What is RFM analysis: https://www.youtube.com/watch?v=guj2gVEEx4s
- Improve Random Forest Hyperparameters: https://towardsdatascience.com/random-forest-hyperparameters-and-how-to-fine-tune-them-17aee785ee0d#:~:text=The%20most%20important%20hyper%2Dparameters,MSE%20or%20MAE%20for%20regression)

**Questions to answer:**
- 1. Which customers have the highest top-up probability in the next 30 days.
- 2. Which customers have recently topped up but are unlikely to buy.
- 3. Which customers were predicted to top-up but didnt. 

In [8]:
# Imports and general settings

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option('display.max_columns', None)

## Data Wrangling and EDA

In [17]:
# Reading csv file
df = pd.read_csv('../trash/topup.csv' )

In [18]:
df = df[['transactionid', 'aug_created_date', 'clear_txn_method', '$ USD',
         'userid', 'User_Creation_Date', 'User_status', 'user_usermobilecountry', 'dateofbirth']]

In [29]:
df.columns = ['txn_id', 'txn_date', 'txn_type', 'amount_usd', 'user_id', 'user_date', 'user_status', 'country', 'dob']

In [32]:
df.head(3)

Unnamed: 0,txn_id,txn_date,txn_type,amount_usd,user_id,user_date,user_status,country,dob
0,b1334598-970a-4936-a535-818689761705,2022-10-27 00:00:00,MPESA load,12.42,b74a52e3-4160-933d-c653-644c3826e1bb,2022-09-18 00:00:00,TnC_ACCEPTED,KE,01/07/1992
1,513d6960-b689-424a-970d-9e0f20f36f44,2022-10-27 00:00:00,load using bank card,13.61,86611122-811a-5480-407d-daefaf2a42c2,2022-10-07 00:00:00,TnC_ACCEPTED,AE,07/15/1984
2,91d42b18-7e04-4be3-be9d-2b7e6734aa36,2022-10-27 00:00:00,load using bank card,1361.25,4d25ec74-1c5e-6b8e-83c4-f4893bfc7147,2022-09-30 00:00:00,TnC_ACCEPTED,AE,09/23/1987


In [34]:
df['txn_date'] = pd.to_datetime(df['txn_date'], infer_datetime_format=True)

In [36]:
df

Unnamed: 0,txn_id,txn_date,txn_type,amount_usd,user_id,user_date,user_status,country,dob
0,b1334598-970a-4936-a535-818689761705,2022-10-27,MPESA load,12.42,b74a52e3-4160-933d-c653-644c3826e1bb,2022-09-18 00:00:00,TnC_ACCEPTED,KE,01/07/1992
1,513d6960-b689-424a-970d-9e0f20f36f44,2022-10-27,load using bank card,13.61,86611122-811a-5480-407d-daefaf2a42c2,2022-10-07 00:00:00,TnC_ACCEPTED,AE,07/15/1984
2,91d42b18-7e04-4be3-be9d-2b7e6734aa36,2022-10-27,load using bank card,1361.25,4d25ec74-1c5e-6b8e-83c4-f4893bfc7147,2022-09-30 00:00:00,TnC_ACCEPTED,AE,09/23/1987
3,025af1e7-3f13-4925-b554-f61683dd656a,2022-10-27,load using bank card,90.11,78440d56-048f-3698-1aa0-5bf01029da90,2022-10-26 00:00:00,TnC_ACCEPTED,AE,05/27/2000
4,9884ceae-818a-4f23-b126-0aab20b5b47f,2022-10-27,load using bank card,13.61,86611122-811a-5480-407d-daefaf2a42c2,2022-10-07 00:00:00,TnC_ACCEPTED,AE,07/15/1984
...,...,...,...,...,...,...,...,...,...
165609,5f40af98-8e7d-4640-b4e1-40662109f39b,2019-10-30,load using bank card,14.57,6823f09f-c6d5-19b4-a3d3-fcb9fa3a231e,2019-03-09 00:00:00,DEACTIVATED,AE,5/2/1975
165610,2da643e1-fb9c-4e2b-aec5-a1f4061e138f,2019-10-30,load using bank card,14.28,6823f09f-c6d5-19b4-a3d3-fcb9fa3a231e,2019-03-09 00:00:00,DEACTIVATED,AE,5/2/1975
165611,57d66422-ad89-4552-b766-b42c31bcc5ca,2019-10-30,load using bank card,2.85,6823f09f-c6d5-19b4-a3d3-fcb9fa3a231e,2019-03-09 00:00:00,DEACTIVATED,AE,5/2/1975
165612,b1e15e8e-8cdd-4080-8da0-d5f7ec500e06,2019-10-30,load using bank card,3.13,6823f09f-c6d5-19b4-a3d3-fcb9fa3a231e,2019-03-09 00:00:00,DEACTIVATED,AE,5/2/1975


In [35]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 165614 entries, 0 to 165613
Data columns (total 9 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   txn_id       165614 non-null  object        
 1   txn_date     165614 non-null  datetime64[ns]
 2   txn_type     165614 non-null  object        
 3   amount_usd   165614 non-null  float64       
 4   user_id      165614 non-null  object        
 5   user_date    165614 non-null  object        
 6   user_status  165614 non-null  object        
 7   country      165614 non-null  object        
 8   dob          165614 non-null  object        
dtypes: datetime64[ns](1), float64(1), object(7)
memory usage: 11.4+ MB
