### What is Cohort?
Cohort means "a group of people with a shared characteristic".

### What is Cohort Analysis?
Cohort analysis is a subset of behavioral analytics that takes the data from a given data set (e.g. an EMRS, an e-commerce platform, web application, or online game) and rather than looking at all users as one unit, it breaks them into related groups for analysis.

### Types of cohorts:

1. Time Cohorts - Customers who signed up for a product during a particular time frame. 
( Analyzing these cohorts shows the customer's behavior depending on the time they started using the company’s products.) 
( The time may be monthly or quarterly even daily. )
2. Behaovior cohorts - customers who purchased a product or subscribed to a service in the past. It groups customers by the type of product or service they signed up. Customers who signed up for basic level services might have different needs than those who signed up for advanced services. Understaning the needs of the various cohorts can help a company design custom-made services or products for particular segments.
3. Size cohorts - refer to the various sizes of customers who purchase company’s products or services. This categorization can be based on the amount of spending in some periodic time after acquisition or the product type that the customer spent most of their order amount in some period of time.

### Dataset Description

1. Invoice - Invoice Number
2. StockCode - Code of the Stock
3. Description - Stock Name
4. Quantity - Number of quantities
5. InvoiceDate - Date Of Purchase
6. Price - Price of the stock
7. Customer ID - ID of the Customer
8. Country - Name of the Country

### Problem

1. To Do Cohort Analysis.
2. Visualize Customer Retention %. 

### Importing Libraries and Dataset


In [None]:
#importing Libraries
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

pd.set_option('display.max_column',None)
pd.set_option('display.max_row',None)


In [None]:
#importing Data
data=pd.read_csv('../input/online-retail-ii-uci/online_retail_II.csv',parse_dates=['InvoiceDate'])

In [None]:
#Dataset Shape
data.shape

### Cleaning Dataset

In [None]:
# Checking for Null Values
data.isna().sum()

In [None]:
# Dropping Null Values in Customer ID Column
data=data.dropna(subset=['Customer ID'])

In [None]:
# Checking for Duplicates
data.duplicated().sum()

In [None]:
# Dropping duplicates
data=data.drop_duplicates()

### Cohort Analysis

Time Period - Month

In [None]:
data['InvoiceMonth']=data['InvoiceDate'].apply(lambda x: dt.datetime(x.year,x.month,1))

grouping= data.groupby('Customer ID')['InvoiceMonth']
data['CohortMonth']=grouping.transform('min')
data.head(3)

In [None]:
def cohort_index(df,column):
    year=df[column].dt.year
    month=df[column].dt.month
    day=df[column].dt.day
    return year,month,day

inv_year,inv_month,inv_day=cohort_index(data,'InvoiceMonth')
coh_year,coh_month,coh_day=cohort_index(data,'CohortMonth')

data['CohortIndex']=((inv_year-coh_year)*12)+(inv_month-coh_month)+1

In [None]:
grouping=data.groupby(['CohortMonth','CohortIndex'])
cohort_data=grouping['Customer ID'].apply(pd.Series.nunique).reset_index()
cohort_data

In [None]:
cohort_counts=cohort_data.pivot(index='CohortMonth',columns='CohortIndex',values='Customer ID')

In [None]:
cohort_data=cohort_counts.iloc[:,0]
retention = cohort_counts.divide(cohort_data,axis=0)
retention.index=retention.index.date
retention

In [None]:
plt.figure(figsize=(25,25))
plt.title('Retention Rate')
sns.heatmap(retention,annot=True,fmt='.0%',vmin = 0.0,vmax = 0.5,cmap='Blues')