# Exploratory Analysis & Summary Statistics

**We will use common marketing metrics and we will calculate them using Pandas and we will visualize the results**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

marketing = pd.read_csv('./datasets/marketing.csv')

In [2]:
print(marketing.head())

      user_id date_served marketing_channel          variant converted  \
0  a100000029      1/1/18         House Ads  personalization      True   
1  a100000030      1/1/18         House Ads  personalization      True   
2  a100000031      1/1/18         House Ads  personalization      True   
3  a100000032      1/1/18         House Ads  personalization      True   
4  a100000033      1/1/18         House Ads  personalization      True   

  language_displayed language_preferred    age_group date_subscribed  \
0            English            English   0-18 years          1/1/18   
1            English            English  19-24 years          1/1/18   
2            English            English  24-30 years          1/1/18   
3            English            English  30-36 years          1/1/18   
4            English            English  36-45 years          1/1/18   

  date_canceled subscribing_channel is_retained  
0           NaN           House Ads        True  
1           NaN       

#### Conversion Rate

% of people we marketed to who ultimately converted to our product

\begin{equation*}
\text{Conversion Rate} = \frac{\text{Number of people who convert}}{\text{Total Number of people we marketed to}}
\end{equation*}

In [3]:
subscribers = marketing[marketing['converted'] == True]['user_id'].nunique()
total = marketing['user_id'].nunique()
conv_rate = subscribers / total

print(round(conv_rate *100,2),'%')

13.89 %


#### Retention Rate

% of people that remain subscribed after a certain period of time

\begin{equation*}
\text{Retention Rate} = \frac{\text{Number of people who remain subscribed}}{\text{Total Number of people who converted}}
\end{equation*}

In [4]:
retained = marketing[marketing['is_retained'] == True]['user_id'].nunique()

subscribers = marketing[marketing['converted'] == True]['user_id'].nunique()

retention = retained / subscribers

print(round(retention*100, 2), '%')

66.8 %


### Customer Segmentation

**Common ways to segment audiences**
- Age
- Gender
- Location
- Past interactions with the business
- Marketing channels the user interacted with

**Subset of only House Ads**

In [5]:
house_ads = marketing[marketing['subscribing_channel'] == 'House Ads']

print(house_ads.head())

      user_id date_served marketing_channel          variant converted  \
0  a100000029      1/1/18         House Ads  personalization      True   
1  a100000030      1/1/18         House Ads  personalization      True   
2  a100000031      1/1/18         House Ads  personalization      True   
3  a100000032      1/1/18         House Ads  personalization      True   
4  a100000033      1/1/18         House Ads  personalization      True   

  language_displayed language_preferred    age_group date_subscribed  \
0            English            English   0-18 years          1/1/18   
1            English            English  19-24 years          1/1/18   
2            English            English  24-30 years          1/1/18   
3            English            English  30-36 years          1/1/18   
4            English            English  36-45 years          1/1/18   

  date_canceled subscribing_channel is_retained  
0           NaN           House Ads        True  
1           NaN       

**Percentage of user retained using *'House Ads'***

In [6]:
retained = house_ads[house_ads['is_retained'] == True]['user_id'].nunique()

subscribers = house_ads[house_ads['converted'] == True]['user_id'].nunique()

retention = retained / subscribers

print(round(retention*100, 2), '%')

58.05 %


**Retention by channel**

In [7]:
retained = marketing[marketing['is_retained'] == True].groupby(['subscribing_channel'])['user_id'].nunique()
print(retained)

subscribing_channel
Email        141
Facebook     152
House Ads    173
Instagram    158
Push          54
Name: user_id, dtype: int64


**Converted by channel**

In [8]:
subscribers = marketing[marketing['converted'] == True].groupby(['subscribing_channel'])['user_id'].nunique()
print(subscribers)

subscribing_channel
Email        161
Facebook     221
House Ads    298
Instagram    232
Push          77
Name: user_id, dtype: int64


**Retention Rate by channel**

In [9]:
channel_retention_rate = (retained/subscribers)*100
print(channel_retention_rate)

subscribing_channel
Email        87.577640
Facebook     68.778281
House Ads    58.053691
Instagram    68.103448
Push         70.129870
Name: user_id, dtype: float64
