### What抯 a Customer Worth?

![](https://cdn-images-1.medium.com/max/800/1*46_4Ps1cl9Ansa2f0BpaeQ.png)

### Objective

Customers keep coming and going, but they do so silently. 

1. Is there a specific metric that weights the relationship between the customers and the business? 
2. What are the individual components that play a vital role in calculating this metric?
3. Using the individual components, how do we calculate the metric? 

One such metric is CLV (Customer Life Time Value). The objective of this kernal is to understand how CLV is calculated. 

In [None]:
import numpy as np 
import pandas as pd

hist = pd.read_csv('../input/historical_transactions.csv')

In [None]:
hist = hist[['card_id','purchase_date','purchase_amount']]
hist = hist.sort_values(by=['card_id', 'purchase_date'], ascending=[True, True])

### (R)ecency (F)requency (M)onitary Value

Why are we subsetting just three columns in historical transactions dataset? For the CLV models, the following components are used:

* Recency - This represents the age of the customer when they made their latest transactions. (Current_date - last_transaction_date)
* Frequency - This represents the total number of transactions/number of visits a customer has made. (Count of total transactions)
* Monitary - This represents the total purchase amount that a specified customer has made. (Sum of purchase_amt)
* Time - This represents the age of the customer. Time span between a customer抯 first and last transaction.

In [None]:
hist.head()

In [None]:
## Time
from datetime import datetime

z = hist.groupby('card_id')['purchase_date'].max().reset_index()
q = hist.groupby('card_id')['purchase_date'].min().reset_index()

z.columns = ['card_id', 'Max']
q.columns = ['card_id', 'Min']

## Extracting current timestamp
now = datetime.now()
curr_date = now.strftime("%m-%d-%Y, %H:%M:%S")
curr_date = pd.to_datetime(curr_date)

rec = pd.merge(z,q,how = 'left',on = 'card_id')
rec['Min'] = pd.to_datetime(rec['Min'])
rec['Max'] = pd.to_datetime(rec['Max'])

## Time value 
rec['Recency'] = (curr_date - rec['Max']).astype('timedelta64[D]') ## current date - most recent date

## Recency value
rec['Time'] = (rec['Max'] - rec['Min']).astype('timedelta64[D]') ## Age of customer, MAX - MIN

rec = rec[['card_id','Time','Recency']]
rec.head()

In [None]:
## Frequency
freq = hist.groupby('card_id').size().reset_index()
freq.columns = ['card_id', 'Frequency']
freq.head()

In [None]:
## Monitary
mon = hist.groupby('card_id')['purchase_amount'].sum().reset_index()
mon.columns = ['card_id', 'Monitary']
mon.head()

In [None]:
final = pd.merge(freq,mon,how = 'left', on = 'card_id')
final = pd.merge(final,rec,how = 'left', on = 'card_id')

final['historic_CLV'] = final['Frequency'] * final['Monitary'] 
final['AOV'] = final['Monitary']/final['Frequency'] ## AOV - Average order value (i.e) total_purchase_amt/total_trans
final['Predictive_CLV'] = final['Time']*final['AOV']*final['Monitary']*final['Recency'] 

final.head()

### Hope these features boost your model performance. HAPPY KAGGLING! 