# A/B Testing Analysis for Ad Campaign

## Data Source
The data used is from Kaggle, linked [here](https://www.kaggle.com/datasets/amirmotefaker/ab-testing-dataset/data). 

## Goal 
The goal of this notebook is to analyze marketing techniques using A/B testing, comparing a test campaign to a control campaign, utilizing inferential statistics (hypothesis testing) and visualizations. Through this, we should be able to identify if the test campaign made a difference in the amount of customers. 

## Skills Used
* Python Programming
* Data cleaning
* Feature Engineering
* Descriptive Statistics
* Hypothesis Testing (with SciPy)
* Time Series Visualization (with Plotly) 
* Histogram Visualization (with Plotly)

In [115]:
##import dependencies 

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import sklearn 
from scipy.stats import shapiro 
from scipy.stats import normaltest


### Import Data 
There are two data files to import, one for the control group and one for the test group. Note that for this data, instead of being comma-separated, it is **semi-colon separated**, so when reading the files we have to indicate "sep*** ***= ';'"

In [116]:
control_data = pd.read_csv('data/control_group.csv', sep = ';')
test_data = pd.read_csv('data/test_group.csv', sep = ';')


In [318]:
#Check out the data heads for each: 
control_data.head()

Unnamed: 0,Campaign,Date,Amount Spent,Impressions,Reach,Num Clicks,Num Searches,Num Views,Num Added to Cart,Num Purchases
0,Control Campaign,2019-01-08,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2019-02-08,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,2019-03-08,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,2019-04-08,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,2019-05-08,1835,109559.758621,88844.931034,5320.793103,2221.310345,1943.793103,1300.0,522.793103


In [118]:
test_data.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


In [119]:
# Let's see how big these datasets are: 
print(f'Control dataset has a length of: {str(len(control_data))}')
print(f'Test dataset has a length of: {str(len(test_data))}')


Control dataset has a length of: 30
Test dataset has a length of: 30


Pretty small dataset!

### Cleanup 

The column names are a little much, so I will change them to be a little more concise

In [120]:
new_col_names = ['Campaign', 'Date', 'Amount Spent', 'Impressions', 'Reach', 'Num Clicks', 
                'Num Searches', 'Num Views', 'Num Added to Cart', 'Num Purchases']

control_data.columns = new_col_names
test_data.columns = new_col_names 

test_data.head(1)

Unnamed: 0,Campaign,Date,Amount Spent,Impressions,Reach,Num Clicks,Num Searches,Num Views,Num Added to Cart,Num Purchases
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255


In [121]:
control_data.head(1)

Unnamed: 0,Campaign,Date,Amount Spent,Impressions,Reach,Num Clicks,Num Searches,Num Views,Num Added to Cart,Num Purchases
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0


Now I can look at if there are any missing values in the datasets

In [122]:
control_data.isnull().sum()

Campaign             0
Date                 0
Amount Spent         0
Impressions          1
Reach                1
Num Clicks           1
Num Searches         1
Num Views            1
Num Added to Cart    1
Num Purchases        1
dtype: int64

In [123]:
test_data.isnull().sum()

Campaign             0
Date                 0
Amount Spent         0
Impressions          0
Reach                0
Num Clicks           0
Num Searches         0
Num Views            0
Num Added to Cart    0
Num Purchases        0
dtype: int64

Only the control data has missing values

Because the data is already smaller in sample size, instead of dropping the rows with missing values, I will fill the values with the mean of the affected columns

In [124]:
need_to_fill = ['Impressions', 'Reach', 'Num Clicks', 'Num Searches', 'Num Views',
               'Num Added to Cart', 'Num Purchases']

for col in need_to_fill: 
    mean = control_data[col].mean()
    control_data[col].fillna(mean, inplace = True)
    

In [125]:
control_data.isnull().sum()

Campaign             0
Date                 0
Amount Spent         0
Impressions          0
Reach                0
Num Clicks           0
Num Searches         0
Num Views            0
Num Added to Cart    0
Num Purchases        0
dtype: int64

In our datasets,we have some date values. Let's see what form they are in: 

In [126]:
print(control_data['Date'].dtype)
print(test_data['Date'].dtype)


object
object


To make things easier later on when I do visualizations, I will convert the datatype of the Date columns into datetime

In [127]:
control_data['Date'] = pd.to_datetime(control_data['Date'])
test_data['Date'] = pd.to_datetime(test_data['Date'])

print(control_data['Date'].dtype)
print(test_data['Date'].dtype)

datetime64[ns]
datetime64[ns]


### Descriptive Stats

We can get a quick overview of the descriptive stats for each dataset: 

In [128]:
control_stats = control_data.describe()
control_stats

Unnamed: 0,Amount Spent,Impressions,Reach,Num Clicks,Num Searches,Num Views,Num Added to Cart,Num Purchases
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,2288.433333,109559.758621,88844.931034,5320.793103,2221.310345,1943.793103,1300.0,522.793103
std,367.334451,21311.695472,21452.627592,1726.803732,851.025795,764.021907,400.371207,181.810508
min,1757.0,71274.0,42859.0,2277.0,1001.0,848.0,442.0,222.0
25%,1945.5,95191.25,75300.25,4122.25,1629.25,1249.0,942.5,375.5
50%,2299.5,112368.0,91418.0,5272.396552,2340.0,1979.5,1319.5,506.0
75%,2532.0,121259.0,101958.75,6609.5,2655.0,2360.5,1638.0,663.25
max,3083.0,145248.0,127852.0,8137.0,4891.0,4219.0,1913.0,800.0


In [129]:
test_stats = test_data.describe()
test_stats

Unnamed: 0,Amount Spent,Impressions,Reach,Num Clicks,Num Searches,Num Views,Num Added to Cart,Num Purchases
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,2563.066667,74584.8,53491.566667,6032.333333,2418.966667,1858.0,881.533333,521.233333
std,348.687681,32121.377422,28795.775752,1708.567263,388.742312,597.654669,347.584248,211.047745
min,1968.0,22521.0,10598.0,3038.0,1854.0,858.0,278.0,238.0
25%,2324.5,47541.25,31516.25,4407.0,2043.0,1320.0,582.5,298.0
50%,2584.0,68853.5,44219.5,6242.5,2395.5,1881.0,974.0,500.0
75%,2836.25,99500.0,78778.75,7604.75,2801.25,2412.0,1148.5,701.0
max,3112.0,133771.0,109834.0,8264.0,2978.0,2801.0,1391.0,890.0


## Hypothesis Testing 

To conduct hypothesis testing, I will use SciPy's t-test for one-sided t-test 

My hypotheses are as follows: 

__Null Hypothesis (H<sub>0</sub>)__: There is no difference in purchases between the control campaign and the test campaign

__Alternative Hypothesis (H<sub>A</sub>)__: There is a difference in purchases between the control campaign and the test campaign (other: The test campaign shows a higher amount of purchases)

### Test for Normal Distribution 
Before conducting hypothesis testing, I want to see if the datasets are normally distributed. Because the datasets are smaller in size, I will use the __Shapiro-Wilks Test__. If both are normally distributed, I can continue with the hypothesis test. 

In [130]:
from scipy.stats import shapiro, ttest_ind, mannwhitneyu #ttest_ind will be used if both are normally distributed, 
#mannwhitneyu will be used if one dataset is normally distributed and the other is not 

In [131]:
#1. Extract the columns we want from each dataset: 
purchases_control = control_data['Num Purchases']
purchases_test = test_data['Num Purchases']

#2. Conduct the Shapiro-Wilk test for normality on the control group
stat_control, p_value_control = shapiro(purchases_control)
print('Control Group:')
print('Shapiro-Wilk Test Statistic:', stat_control)
print('p-value:', p_value_control)

# Conducting the Shapiro-Wilk test for normality on the test group
stat_test, p_value_test = shapiro(purchases_test)
print('\nTest Group:')
print('Shapiro-Wilk Test Statistic:', stat_test)
print('p-value:', p_value_test)

# Interpretation of normality test
alpha = 0.05
if p_value_control > alpha:
    print("Control Group: Data is normally distributed")
else:
    print("Control Group: Data is not normally distributed")

if p_value_test > alpha:
    print("Test Group: Data is normally distributed")
else:
    print("Test Group: Data is not normally distributed")


Control Group:
Shapiro-Wilk Test Statistic: 0.9432733058929443
p-value: 0.11144547164440155

Test Group:
Shapiro-Wilk Test Statistic: 0.9181894659996033
p-value: 0.024077769368886948
Control Group: Data is normally distributed
Test Group: Data is not normally distributed


#### Mann Whitney U Test
Because our control group is normally distributed and the test group is not (under 95% confidence), I can use the __Mann Whitney U Test__ to do the hypothesis testing.


In [132]:
u_stat, p_value_u = mannwhitneyu(purchases_test, purchases_control, alternative='greater')
print('\nMann-Whitney U Test:')
print('U-statistic:', u_stat)
print('P-value:', p_value_u)

# Interpretation
if p_value_u < alpha:
    print("Reject the null hypothesis - There are more purchases from the test campaign than in the control campaign")
else:
    print("Fail to reject the null hypothesis - There is no significant difference in purchases between the test and control campaigns")


Mann-Whitney U Test:
U-statistic: 445.0
P-value: 0.5324067103535307
Fail to reject the null hypothesis - There is no significant difference in purchases between the test and control campaigns


Under a 95% confidence interval, we fail to reject the null hypothesis, indicating there is no significant difference in purchases between the test campaign and the control campaign

__What if we use a difference confidence interval?__ <br>
Say, 99%? 

Let's check it out! First, I will have to repeat the test for normality under this confidence (alpha = 0.01), and then base our hypothesis test choice based on that. 

In [133]:
stat_control, p_value_control = shapiro(purchases_control)
print('Control Group:')
print('Shapiro-Wilk Test Statistic:', stat_control)
print('p-value:', p_value_control)

# Conducting the Shapiro-Wilk test for normality on the test group
stat_test, p_value_test = shapiro(purchases_test)
print('\nTest Group:')
print('Shapiro-Wilk Test Statistic:', stat_test)
print('p-value:', p_value_test)

# Interpretation of normality test
alpha = 0.01
if p_value_control > alpha:
    print("Control Group: Data is normally distributed")
else:
    print("Control Group: Data is not normally distributed")

if p_value_test > alpha:
    print("Test Group: Data is normally distributed")
else:
    print("Test Group: Data is not normally distributed")

Control Group:
Shapiro-Wilk Test Statistic: 0.9432733058929443
p-value: 0.11144547164440155

Test Group:
Shapiro-Wilk Test Statistic: 0.9181894659996033
p-value: 0.024077769368886948
Control Group: Data is normally distributed
Test Group: Data is normally distributed


In [134]:
## Under a 99% Confidence Interval, both are normally distributed. Can use ttest_ind from scipy!

# Conduct one-sided t-test
t, p_val = ttest_ind(purchases_test, purchases_control)

#3. Output the t-statistic and p-value
print('T-statistic:', t)
print('P-value:', p_val)

# Interpretation, with 99% confidence (alpha = 0.01)
alpha = 0.01
if p_val < alpha:
    print("Reject the null hypothesis - There are more purchases from the test campaign than in the control campaign")
else:
    print("Fail to reject the null hypothesis - There is no significant difference in purchases between the test and control campaigns")

T-statistic: -0.03066909523750146
P-value: 0.9756387309702421
Fail to reject the null hypothesis - There is no significant difference in purchases between the test and control campaigns


While both of our datasets are normally distributed under the 99% confidence level, our conclusion remains the same- __we fail to reject the null hypothesis, and there is no significant difference in purchases between the test and control campaigns__. 

## Comparing Campaigns Using Visualizations
Hypothesis testing is a critical method to use while conducting A/B testing. Another useful method in comparing the campaigns is through visualizations, as it can make all the numbers clearer to the reader. 

### Feature Engineering 
Before creating visualizations, it would be a good idea to add some metrics to our datasets, and then merge the datasets into one. Some metrics include: 
* __Click-Through Rate (CTR)__: The ratio of of users who click on a link to those who actually view the page, caluclated by Num Clicks / Num Views (times 100 to get a percentage)
* __Cost Per Click (CPC)__: Average cost for each click on an ad/link. calculated by Amount Spent / Num Clicks
* __Conversion Rate (CR)__: Percent of those who clicked a link that actually purchased the product, calculated by Num Purchases/Num Clicks * 100
* __Return on Investment (ROI)__: Profitability of Investment relative to its cost, calculated by revenue generated (Num Purchases) minus the Amount Spent, all divided by the Amount Spent (times 100 to get percentage) 
* __% Unique Impressions__: The percentage of impressions that are unique. Calculated by dividing reach (# unique impressions) divided by Impressions, multiplied by 100. 

To make visualizing easier, I will combine the dataframes, and then add the new features. 

In [321]:
merged_df = pd.concat([control_data, test_data], ignore_index = True)
merged_df.head()

Unnamed: 0,Campaign,Date,Amount Spent,Impressions,Reach,Num Clicks,Num Searches,Num Views,Num Added to Cart,Num Purchases
0,Control Campaign,2019-01-08,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2019-02-08,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,2019-03-08,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,2019-04-08,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,2019-05-08,1835,109559.758621,88844.931034,5320.793103,2221.310345,1943.793103,1300.0,522.793103


In [322]:
merged_df['Click-Through Rate (%)'] = (merged_df['Num Clicks'] / merged_df['Impressions']) * 100 

merged_df['Cost Per Click'] = merged_df['Amount Spent'] / merged_df['Num Clicks']

merged_df['Conversion Rate (%)'] = (merged_df['Num Purchases']/ merged_df['Num Clicks']) * 100 

merged_df['ROI (%)'] = ((merged_df['Num Purchases'] - merged_df['Amount Spent']) / merged_df['Amount Spent']) * 100

merged_df['% Unique Impressions'] = (merged_df['Reach'] / merged_df['Impressions']) * 100

merged_df.head()

Unnamed: 0,Campaign,Date,Amount Spent,Impressions,Reach,Num Clicks,Num Searches,Num Views,Num Added to Cart,Num Purchases,Click-Through Rate (%),Cost Per Click,Conversion Rate (%),ROI (%),% Unique Impressions
0,Control Campaign,2019-01-08,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0,8.483471,0.324971,8.808438,-72.894737,68.837513
1,Control Campaign,2019-02-08,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0,6.700264,0.216646,6.300863,-70.916335,84.69349
2,Control Campaign,2019-03-08,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0,4.941121,0.360018,5.716042,-84.122919,84.170646
3,Control Campaign,2019-04-08,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0,4.205659,0.632953,11.092985,-82.474227,84.023985
4,Control Campaign,2019-05-08,1835,109559.758621,88844.931034,5320.793103,2221.310345,1943.793103,1300.0,522.793103,4.856521,0.344873,9.825473,-71.509913,81.092668


In [324]:
#merged_df.to_csv('ab_test_analysis_merged.csv')

#### Visualizations 
Some visualizations that we can create are: 
* Time series depicting the purchases
* 

In [137]:
#merged_df.resample(rule='M', on='Date')['Num Purchases'].mean()

#### Get Relevant Data
In our data, we have dates of the 8th of the month for every month in the year. On top of this, we have the data for the majority of the month of August. I will only use the data from the 8th of the month for every month, to make sure that each month has an equal representation. 

In [320]:
eighth_of_month_df = merged_df[merged_df['Date'].isin(['2019-01-08', '2019-02-08','2019-03-08','2019-04-08','2019-05-08','2019-06-08',
                                 '2019-07-08','2019-08-08','2019-09-08','2019-10-08','2019-11-08','2019-12-08'])]

eighth_of_month_df.reset_index(inplace = True)

len(eighth_of_month_df)

24

In [139]:
eighth_of_month_df[eighth_of_month_df['Campaign'] == 'Test Campaign'].describe()

Unnamed: 0,index,Amount Spent,Impressions,Reach,Num Clicks,Num Searches,Num Views,Num Added to Cart,Num Purchases,Click-Through Rate (%),Cost Per Click,Conversion Rate (%),ROI (%)
count,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0
mean,35.5,2652.25,73533.833333,48352.416667,6079.166667,2403.166667,1825.166667,1005.583333,577.0,9.878464,0.494115,10.419592,-77.842353
std,3.605551,233.04901,30852.350639,28645.363891,1996.457719,379.73695,631.626016,297.423621,217.897808,5.886431,0.209876,5.217351,8.841265
min,30.0,2297.0,33669.0,10598.0,3038.0,1854.0,858.0,424.0,255.0,4.483876,0.299937,3.384615,-91.522606
25%,32.75,2448.5,44804.25,30101.0,4219.75,2099.75,1346.0,881.25,408.25,5.312968,0.338992,7.127054,-85.778078
50%,35.5,2681.0,74357.0,38984.0,6523.5,2335.5,1898.0,1015.5,623.0,7.750055,0.39884,8.48653,-75.25805
75%,38.25,2832.75,96470.25,73372.5,7945.0,2766.5,2237.25,1247.0,723.75,12.80231,0.643425,13.458636,-71.457492
max,41.0,3008.0,124591.0,95138.0,8264.0,2899.0,2761.0,1382.0,890.0,21.337135,0.990125,21.085051,-66.565085


In [140]:
import plotly.graph_objects as go

In [141]:
control = eighth_of_month_df[eighth_of_month_df['Campaign'] == 'Control Campaign']
test = eighth_of_month_df[eighth_of_month_df['Campaign'] == 'Test Campaign']

fig = go.Figure()
# Create and style traces
fig.add_trace(go.Scatter(x= control['Date'], y= control['Num Purchases'], mode='lines', name='Control Campaign',
                         line=dict(color='firebrick', width=4)))
fig.add_trace(go.Scatter(x= test['Date'], y= test['Num Purchases'], name = 'Test Campaign',
                         line=dict(color='royalblue', width=4)))


fig.update_layout(
    title="Number of Purchases Over Time",
    xaxis_title="Date",
    yaxis_title="Number of Purchases",
    legend_title="Campaign Type",
    font=dict(
        family="Overpass",
        size=18,
        color="Black"
    )
)


fig.show()

Insert Analysis Here

In [None]:
## Basic Metrics 



In [149]:
## Bar Graphs for the %s: CTR, CR, ROI (can do all on one plot) 

metrics =['Click-Through_Rate', 'Conversion Rate', 'Return on Investment']
control_mean_ctr = round(control['Click-Through Rate (%)'].mean(), 2)
control_mean_cr = round(control['Conversion Rate (%)'].mean(), 2)
control_mean_roi = round(control['ROI (%)'].mean(), 2)
control_mean_metrics = [control_mean_ctr, control_mean_cr, control_mean_roi]

test_mean_ctr = round(test['Click-Through Rate (%)'].mean(), 2)
test_mean_cr = round(test['Conversion Rate (%)'].mean(), 2)
test_mean_roi = round(test['ROI (%)'].mean(), 2)
test_mean_metrics = [test_mean_ctr, test_mean_cr, test_mean_roi]

fig = go.Figure(data=[
    go.Bar(name='Control Campaign', x= metrics, y=control_mean_metrics),
    go.Bar(name='Test Campaign', x=metrics, y=test_mean_metrics)
])
# Change the bar mode
#fig.update_layout(barmode='group')
fig.update_layout(
    title="Calculated Campaign Metrics",
    xaxis_title="Metric Type",
    yaxis_title="Mean Metric Value (%)",
    legend_title="Campaign Type",
    font=dict(
        family="Overpass",
        size=18,
        color="Black"
    ), 
    barmode = 'group'
)
fig.show()

Ideas for vizs: <br> 
* Distributions of Impressions, Reach, Num Clicks, Num Searches, Num Views, Num Added to Cart, Num Purchases
* Month of August time series control campaign stats vs test campaign stats

In [155]:
from plotly.subplots import make_subplots

In [261]:
'''Function to create grouped histograms
Parameters- 
* col_name (str): name of column to base the viz on 
* control_mean_annotation_offset (int or float): Where to place the annotation for the control group's 
vertical mean line (dashed) with respect to the vertical line (negative values place the annotation to the left of 
the line, positive values place annotation to the right of the line)
* test_mean_annotation_offset: same as control_mean_annotation offset, except for the test group's vertical mean
line (solid, green line)
* y (int or float): indicates how high or low on the viz the annotations will be 
* bargap (int or float, optional): Adds space between bar groups. Default is 0, or no gap

'''
def make_hist(col_name, control_mean_annotation_offset, test_mean_annotation_offset, y, bargap = 0): 
    x0 = control[col_name]
    x1 = test[col_name]

    fig = go.Figure()
    fig.add_trace(go.Histogram(x=x0, name = 'Control Campaign'))
    fig.add_trace(go.Histogram(x=x1, name = 'Test Campaign'))
    fig.add_vline(x= control[col_name].mean(), line_width=1.5, line_dash="dash", line_color="black")#, annotation_text= 'Control Avg')
    fig.add_annotation(x=control[col_name].mean() + control_mean_annotation_offset, y= y,
                text="Control Avg",
                showarrow=False,
                yshift=10)

    fig.add_vline(x= test[col_name].mean(), line_width=1.5, line_color="green")#, annotation_text= 'Test Avg')
    fig.add_annotation(x=test[col_name].mean() + test_mean_annotation_offset, y=y,
                text="Test Avg",
                showarrow=False,
                yshift=10)

    fig.update_layout(
        title=f"Distribution of {col_name}",
        xaxis_title="Value",
        yaxis_title="Count",
        legend_title="Campaign Type",
        font=dict(
            family="Overpass",
            size=18,
            color="Black"
        ), barmode = 'group', 
    bargap = bargap)
    fig.show()
    
    
#make_graph('Num Purchases', -70, 60, 10)

In [262]:
#impressions
make_hist('Impressions', 10000, -10000, 5, 0.05)

In [263]:
make_hist('Reach', 10000, -9000, 5, 0.05)

In [264]:
make_hist('Num Clicks', -550, 400, 5, 0.05)

In [265]:
make_hist('Num Searches', -200, 150, 10, 0.05)

In [266]:
make_hist('Num Views', -200, 150, 10, 0.05)

In [267]:
make_hist('Num Added to Cart', 120, -100, 10, 0.1)

In [268]:
make_hist('Num Purchases', -60, 50, 8, 0.1)

### Was the amount spent worth it? 

Some ways we can determine this: 



### Month of August
Previously, we were only looking at the values of the 8th of every month. The original data includes the majority of August, from the 8th of the month through the 30th, which we can analyze on a daily basis. 

In [273]:
august_df = merged_df[~merged_df['Date'].isin(['2019-01-08', '2019-02-08','2019-03-08','2019-04-08','2019-05-08','2019-06-08',
                                 '2019-07-08','2019-09-08','2019-10-08','2019-11-08','2019-12-08'])]

august_control = august_df[august_df['Campaign'] == 'Control Campaign']

august_test = august_df[august_df['Campaign'] == 'Test Campaign']


In [276]:
#august_test
#august_control


In [297]:
fig = go.Figure()
# Create and style traces
fig.add_trace(go.Scatter(x= august_control['Date'], y= august_control['Impressions'], mode='lines', name='Control Campaign',
                         line=dict(color='firebrick', width=4)))
fig.add_trace(go.Scatter(x= august_test['Date'], y= august_test['Impressions'], name = 'Test Campaign',
                         line=dict(color='royalblue', width=4)))
fig.add_hline(y= august_control['Impressions'].mean(), line_width=1.5, line_dash="dash", line_color="black") #, annotation_text= 'Control Avg')
fig.add_annotation(x = '2019-08-08', y = august_control['Impressions'].mean() + 5000, 
                   text="Control Avg",showarrow=False)

fig.add_hline(y= august_test['Impressions'].mean(), line_width=1.5, line_color="green") #, annotation_text= 'Test Avg')
fig.add_annotation(x = '2019-08-08', y = august_test['Impressions'].mean() - 5000, 
                   text="Test Avg",showarrow=False)

fig.update_layout(
    title="Number of Impressions Over Time (August 2019)",
    xaxis_title="Date",
    yaxis_title="Number of Impressions",
    legend_title="Campaign Type",
    font=dict(
        family="Overpass",
        size=18,
        color="Black"
    )
)


fig.show()

In [298]:
fig = go.Figure()
# Create and style traces
fig.add_trace(go.Scatter(x= august_control['Date'], y= august_control['Reach'], mode='lines', name='Control Campaign',
                         line=dict(color='firebrick', width=4)))
fig.add_trace(go.Scatter(x= august_test['Date'], y= august_test['Reach'], name = 'Test Campaign',
                         line=dict(color='royalblue', width=4)))
fig.add_hline(y= august_control['Reach'].mean(), line_width=1.5, line_dash="dash", line_color="black") #, annotation_text= 'Control Avg')
fig.add_annotation(x = '2019-08-08', y = august_control['Reach'].mean() + 5000, 
                   text="Control Avg",showarrow=False)

fig.add_hline(y= august_test['Reach'].mean(), line_width=1.5, line_color="green") #, annotation_text= 'Test Avg')
fig.add_annotation(x = '2019-08-08', y = august_test['Reach'].mean() - 5000, 
                   text="Test Avg",showarrow=False)

fig.update_layout(
    title="Reach Over Time (August 2019)",
    xaxis_title="Date",
    yaxis_title="Reach Amount",
    legend_title="Campaign Type",
    font=dict(
        family="Overpass",
        size=18,
        color="Black"
    )
)


fig.show()

In [302]:
fig = go.Figure()
# Create and style traces
fig.add_trace(go.Scatter(x= august_control['Date'], y= august_control['Num Clicks'], mode='lines', name='Control Campaign',
                         line=dict(color='firebrick', width=4)))
fig.add_trace(go.Scatter(x= august_test['Date'], y= august_test['Num Clicks'], name = 'Test Campaign',
                         line=dict(color='royalblue', width=4)))
fig.add_hline(y= august_control['Num Clicks'].mean(), line_width=1.5, line_dash="dash", line_color="black") #, annotation_text= 'Control Avg')
fig.add_annotation(x = '2019-08-08', y = august_control['Num Clicks'].mean() - 450, 
                   text="Control Avg",showarrow=False)

fig.add_hline(y= august_test['Num Clicks'].mean(), line_width=1.5, line_color="green") #, annotation_text= 'Test Avg')
fig.add_annotation(x = '2019-08-08', y = august_test['Num Clicks'].mean() + 450, 
                   text="Test Avg",showarrow=False)

fig.update_layout(
    title="Number of Clicks Over Time (August 2019)",
    xaxis_title="Date",
    yaxis_title="Amount of Clicks",
    legend_title="Campaign Type",
    font=dict(
        family="Overpass",
        size=18,
        color="Black"
    )
)


fig.show()

In [309]:
# Num Searches
fig = go.Figure()
# Create and style traces
fig.add_trace(go.Scatter(x= august_control['Date'], y= august_control['Num Searches'], mode='lines', name='Control Campaign',
                         line=dict(color='firebrick', width=4)))
fig.add_trace(go.Scatter(x= august_test['Date'], y= august_test['Num Searches'], name = 'Test Campaign',
                         line=dict(color='royalblue', width=4)))
fig.add_hline(y= august_control['Num Searches'].mean(), line_width=1.5, line_dash="dash", line_color="black") #, annotation_text= 'Control Avg')
fig.add_annotation(x = '2019-08-08', y = august_control['Num Searches'].mean() - 200, 
                   text="Control Avg",showarrow=False)

fig.add_hline(y= august_test['Num Searches'].mean(), line_width=1.5, line_color="green") #, annotation_text= 'Test Avg')
fig.add_annotation(x = '2019-08-08', y = august_test['Num Searches'].mean() + 200, 
                   text="Test Avg",showarrow=False)

fig.update_layout(
    title="Number of Searches Over Time (August 2019)",
    xaxis_title="Date",
    yaxis_title="Amount of Searches",
    legend_title="Campaign Type",
    font=dict(
        family="Overpass",
        size=18,
        color="Black"
    )
)


fig.show()


In [307]:
#Num Views
fig = go.Figure()
# Create and style traces
fig.add_trace(go.Scatter(x= august_control['Date'], y= august_control['Num Views'], mode='lines', name='Control Campaign',
                         line=dict(color='firebrick', width=4)))
fig.add_trace(go.Scatter(x= august_test['Date'], y= august_test['Num Views'], name = 'Test Campaign',
                         line=dict(color='royalblue', width=4)))
fig.add_hline(y= august_control['Num Views'].mean(), line_width=1.5, line_dash="dash", line_color="black") #, annotation_text= 'Control Avg')
fig.add_annotation(x = '2019-08-08', y = august_control['Num Views'].mean() + 100, 
                   text="Control Avg",showarrow=False)

fig.add_hline(y= august_test['Num Views'].mean(), line_width=1.5, line_color="green") #, annotation_text= 'Test Avg')
fig.add_annotation(x = '2019-08-08', y = august_test['Num Views'].mean() - 100, 
                   text="Test Avg",showarrow=False)

fig.update_layout(
    title="Number of Views Over Time (August 2019)",
    xaxis_title="Date",
    yaxis_title="Amount of Views",
    legend_title="Campaign Type",
    font=dict(
        family="Overpass",
        size=18,
        color="Black"
    )
)


fig.show()



In [312]:
##Num Purchases

fig = go.Figure()
# Create and style traces
fig.add_trace(go.Scatter(x= august_control['Date'], y= august_control['Num Purchases'], mode='lines', name='Control Campaign',
                         line=dict(color='firebrick', width=4)))
fig.add_trace(go.Scatter(x= august_test['Date'], y= august_test['Num Purchases'], name = 'Test Campaign',
                         line=dict(color='royalblue', width=4)))
fig.add_hline(y= august_control['Num Purchases'].mean(), line_width=1.5, line_dash="dash", line_color="black") #, annotation_text= 'Control Avg')
fig.add_annotation(x = '2019-08-11', y = august_control['Num Purchases'].mean() + 30, 
                   text="Control Avg",showarrow=False)

fig.add_hline(y= august_test['Num Purchases'].mean(), line_width=1.5, line_color="green") #, annotation_text= 'Test Avg')
fig.add_annotation(x = '2019-08-11', y = august_test['Num Purchases'].mean() - 30, 
                   text="Test Avg",showarrow=False)

fig.update_layout(
    title="Number of Purchases Over Time (August 2019)",
    xaxis_title="Date",
    yaxis_title="Number of Purchases",
    legend_title="Campaign Type",
    font=dict(
        family="Overpass",
        size=18,
        color="Black"
    )
)


fig.show()

### Results
The results for this ad campaign are relatively straight-forward. Based on the hypothesis testing alone, it is clear that the test campaign did not perform significantly better than the control campaign. Based on the visualizations alone, the control and test campaigns showed similar patterns in metrics over time, and similar distributions. On average, the control campaign out-performed the test campaign across most of the features, with the exception of the number of searches and the number of clicks. 

### Conclusion

#### Pitfalls/What Could Be Done Differently 
With this campaign, there are two main pitfalls that could be changed: <br>
* The __amount of data__, and 
* The __metrics tracked__ <br> 

__AMOUNT OF DATA__: <br>
As with most data-based business problems, the more data (or sample size) we have, the better. In an A/B test, more data can allow for better generalization in results. In the dataset provided, we have a total of 60 samples, with 30 of those for the control campaign, and 30 for the test campaign. Of those 60, 36 are for the majority of the month of August (18 days of August for the control campaign, and the same days for the test campaign). That leaves 24 total samples, containing data from the 8th of the month for each month (12 for the control, 12 for the test; including August 8th). This is very little data. If we wanted to get a better idea of how the campaigns worked, it would be recommended to have data from every single day of the campaign, instead of just one day out of the month and 18 days in August. This is assuming that the original curator of the dataset did not just want to compare the campaigns on the 8th of every month for the year. 


__METRICS TRACKED__: <br> 
While the metrics provided in the dataset do give important insight into how the campaigns faired, there are a few other 

