# Cohort analysis

A descriptive analytics tool. It groups the customers into mutually exclusive cohorts, that are measured over time. Cohort analysis provides deeper insights than the so-called vanity metrics. It helps understanding the high level trends better by providing insights on metrics across both the product and the customer lifecycle.

# Types of cohorts
## Time cohorts
Customers who signed up for a product or service during a particular time frame. Analyzing these cohorts shows the customers bhaviour depending on the time they started using the companys  prdocuts or services. The time can be monthly, quarterly or even daily.

## Behaviour cohorts
Customers who purchased a product or subscribed to a service in the past. It groups customers by the type of product or service they signed up: those signing for basic level service may have a different behaviour than the ones going premium. Understanding the needs of various cohorts can help a company design customed-made services or products for particular segments.

## Size cohorts
Refers to the various sizes of customers who purchase companys products or services. This categorization can be based on the amount of spending in some period of  time after acquisition or the product type that the customer spent most of their order amount in some period of time.

# Elements of cohort analysis
## Pivot table
Assigned cohort in rows
Cohort index in columns
Metrics in the table

In [None]:
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns



In [None]:
cohort_counts = pd.read_csv("./data/chapter_1/cohort_counts.csv")

In [None]:
cohort_counts


There are 332 customers who have made their first transaction in january 2011

# Time Cohorts

We will segment customers into acquisition cohorts based on the month they made their first purchase. We will then assign the cohort index to each purchase of the customer. It will represent the number of months since the first transaction.

Time based cohorts group customers by the time they completed their first activity. In this lesson, we will group customers into cohorts based on the month of their first purchase. Then we will mark each transaction based on its relative time period since the first purchase. In this example, we will calculate the number of months since the acquisition. In the next step we will calculate metrics like retention or average spend value, and build this heatmap.





In [None]:
online = pd.read_csv("./data/chapter_1/online.csv", parse_dates=["InvoiceDate"])
online.head()

In [None]:
def get_month(x):
    return dt.datetime(x.year, x.month, 1)

online["InvoiceMonth"] = online["InvoiceDate"].apply(get_month)
online["CohortMonth"] = online.groupby("CustomerID")["InvoiceMonth"].transform("min")

In [None]:
online

In [None]:
def get_date_int(df, column):
    year = df[column].dt.year
    month = df[column].dt.month
    day = df[column].dt.day
    return year, month, day

invoice_year, invoice_month, _ = get_date_int(online, "InvoiceMonth")
cohort_year, cohort_month, _ = get_date_int(online, "CohortMonth")

years_diff = invoice_year - cohort_year
months_diff = invoice_month - cohort_month
online["CohortIndex"] = years_diff * 12 + months_diff + 1
online.head()

In [None]:
cohort_data = online.groupby(["CohortMonth", "CohortIndex"])["CustomerID"].nunique().reset_index()

cohort_counts = cohort_data.pivot(index="CohortMonth", columns="CohortIndex", values="CustomerID")


In [None]:
cohort_counts


The first column indicates how many customers are initially on each cohort (100% for all cohorts). Then, how many customers were still actives in the following months.

# Metrics

## Retention Rate

In [None]:
cohort_sizes= cohort_counts.iloc[:, 0]
retention = cohort_counts.divide(cohort_sizes, axis=0).round(3)*100
retention

## Other Metrics


In [None]:
cohort_data = online.groupby(["CohortMonth", "CohortIndex"])["Quantity"].mean()
cohort_data = cohort_data.reset_index()
average_quantity = cohort_data.pivot(index="CohortMonth", columns="CohortIndex", values="Quantity")

average_quantity = average_quantity.round(2)
average_quantity

# Visualizing Cohort Analysis




In [None]:
plt.figure(figsize=(12, 8))
plt.title("Cohort Analysis: Retention Rates")
sns.heatmap(
    retention,
    annot=True,
    fmt=".0f",
    cmap="Blues",
    linewidths=0.5,
    linecolor="white",
    cbar_kws={"label": "Retention Rate (%)"},
)
plt.xlabel("Cohort Index")
plt.show()