# Instagram Reach A/B Test
### 1. Import data libraries and datasets

In [12]:
import pandas as pd

#datetime.timedelta() establishes the # of minutes, hours, or seconds so it can easily be converted into another # of seconds, hours, minutes etc.
import datetime
from datetime import date, timedelta

#for visualization
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_white"

#import the csv file, note ; as the deliminator 
control_data = pd.read_csv(r"C:\Users\haley\OneDrive\Documents\GitHub Projects\AB Testing\control_group.csv", sep = ";")
test_data = pd.read_csv(r"C:\Users\haley\OneDrive\Documents\GitHub Projects\AB Testing\test_group.csv", sep = ";")

In [6]:
#example of timedelta
A = datetime.timedelta(hours = 55)
A.total_seconds()

198000.0

In [14]:
#check datasets
control_data.head()
test_data.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


### 2. Data Preparation
Some column names have errors so fix them below.

In [17]:
control_data.columns = ["Campaign Name", "Date", "Amount Spent",
                        "Number of Impressions", "Reach","Website Visits", 
                       "Searches Received", "Content Viewed", 
                       "Added to Cart", "Purchases"]
test_data.columns = ["Campaign Name", "Date", "Amount Spent",
                        "Number of Impressions", "Reach", "Website Visits", 
                       "Searches Received", "Content Viewed", 
                       "Added to Cart", "Purchases"]

In [20]:
#count if there are any columns with null fields
print(control_data.isnull().sum())
print(test_data.isnull().sum())

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    1
Reach                    1
Website Visits           1
Searches Received        1
Content Viewed           1
Added to Cart            1
Purchases                1
dtype: int64
Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    0
Reach                    0
Website Visits           0
Searches Received        0
Content Viewed           0
Added to Cart            0
Purchases                0
dtype: int64


In [23]:
#fill in the missing values. Replace null with the mean value of column
control_data["Number of Impressions"].fillna(value=control_data["Number of Impressions"].mean(),
                                            inplace=True)
control_data["Reach"].fillna(value=control_data["Reach"].mean(),
                                            inplace=True)
control_data["Website Visits"].fillna(value=control_data["Website Visits"].mean(),
                                            inplace=True)
control_data["Searches Received"].fillna(value=control_data["Searches Received"].mean(),
                                            inplace=True)
control_data["Content Viewed"].fillna(value=control_data["Content Viewed"].mean(),
                                            inplace=True)
control_data["Added to Cart"].fillna(value=control_data["Added to Cart"].mean(),
                                            inplace=True)
control_data["Purchases"].fillna(value=control_data["Purchases"].mean(),
                                            inplace=True)


### 3. Create a merged dataset (control + test)

In [27]:
ab_data = control_data.merge(test_data,
                            how="outer").sort_values(["Date"])
ab_data = ab_data.reset_index(drop=True) #creates an index as a new column in the dataframe
print(ab_data.head())

      Campaign Name        Date  Amount Spent  Number of Impressions    Reach  \
0  Control Campaign   1.08.2019          2280                82702.0  56930.0   
1     Test Campaign   1.08.2019          3008                39550.0  35820.0   
2     Test Campaign  10.08.2019          2790                95054.0  79632.0   
3  Control Campaign  10.08.2019          2149               117624.0  91257.0   
4     Test Campaign  11.08.2019          2420                83633.0  71286.0   

   Website Visits  Searches Received  Content Viewed  Added to Cart  Purchases  
0          7016.0             2290.0          2159.0         1819.0      618.0  
1          3038.0             1946.0          1069.0          894.0      255.0  
2          8125.0             2312.0          1804.0          424.0      275.0  
3          2277.0             2475.0          1984.0         1629.0      734.0  
4          3750.0             2893.0          2617.0         1075.0      668.0  




In [28]:
#To run the AB test, each campaign needs the same # of counts
print(ab_data["Campaign Name"].value_counts())

Control Campaign    30
Test Campaign       30
Name: Campaign Name, dtype: int64


### 4. Visualizations
Analyze the Relationship

#### Graph 1: Number of Impressions to Amount Spent 

In [36]:
figure=px.scatter(data_frame = ab_data,
                  x="Number of Impressions",
                  y="Amount Spent", 
                  size="Amount Spent", #the larger the circle, the greater then amount spent 
                  color="Campaign Name", #different colors based on which campaign
                  trendline="lowess") #ols = ordinary least squares (so linear regression). There are many other options of trendlines like lowess (polynomial regression)
figure.show()

The control campaign recieved more impressions based on the graph above. Now lets look at website searches for both campaigns using a pie chart.

#### Graph 2: Total Searches Pie Chart 

In [54]:
label = ["Total Searches from Control Campaign",
        "Total Searches from Test Campaign"]
counts = [sum(control_data["Searches Received"]),
        sum(test_data["Searches Received"])]
colors = ['magenta','lightgreen']
pie_chart = go.Figure(data=[go.Pie(labels=label, values=counts)])
pie_chart.update_layout(title_text='Control Vs Test: Searches')
pie_chart.update_traces(hoverinfo='label+percent', textinfo='value',
                 textfont_size=30,
                 marker=dict(colors=colors,
                            line=dict(color='black', width=3)))
pie_chart.show()

In [41]:
## Here are all the color options in the plotly library. You can change the piechart to any one of these colors
plotly_colorscales = px.colors.named_colorscales()
  
# printing color scales
print(plotly_colorscales)

['aggrnyl', 'agsunset', 'blackbody', 'bluered', 'blues', 'blugrn', 'bluyl', 'brwnyl', 'bugn', 'bupu', 'burg', 'burgyl', 'cividis', 'darkmint', 'electric', 'emrld', 'gnbu', 'greens', 'greys', 'hot', 'inferno', 'jet', 'magenta', 'magma', 'mint', 'orrd', 'oranges', 'oryel', 'peach', 'pinkyl', 'plasma', 'plotly3', 'pubu', 'pubugn', 'purd', 'purp', 'purples', 'purpor', 'rainbow', 'rdbu', 'rdpu', 'redor', 'reds', 'sunset', 'sunsetdark', 'teal', 'tealgrn', 'turbo', 'viridis', 'ylgn', 'ylgnbu', 'ylorbr', 'ylorrd', 'algae', 'amp', 'deep', 'dense', 'gray', 'haline', 'ice', 'matter', 'solar', 'speed', 'tempo', 'thermal', 'turbid', 'armyrose', 'brbg', 'earth', 'fall', 'geyser', 'prgn', 'piyg', 'picnic', 'portland', 'puor', 'rdgy', 'rdylbu', 'rdylgn', 'spectral', 'tealrose', 'temps', 'tropic', 'balance', 'curl', 'delta', 'oxy', 'edge', 'hsv', 'icefire', 'phase', 'twilight', 'mrybm', 'mygbm']


#### Graph 3: Website Clicks Pie Chart

In [57]:
label = ["Amount Spent in Control Campaign",
        "Amount Spent in Test Campaign"]
counts = [sum(control_data["Amount Spent"]),
         sum(test_data["Amount Spent"])]
colors = ['gold','emerald']
pie_chart2 = go.Figure(data=[go.Pie(labels=label, values=counts)])
pie_chart2.update_layout(title_text='Control Vs Test: Amount Spent')
pie_chart2.update_traces(hoverinfo='label+percent', textinfo='value',
                        textfont_size=30,
                        marker=dict(colors=colors,
                                   line=dict(color='black',width=3)))
pie_chart2.show()

#### Graph 4: Added to Cart

In [61]:
label = ["Products Added to Cart from Control Campaign", 
         "Products Added to Cart from Test Campaign"]
counts = [sum(control_data["Added to Cart"]), 
          sum(test_data["Added to Cart"])]
colors = ['purple','teal']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Added to Cart')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

#### Graph 5: Content Viewed vs Website Clicks

In [63]:
figure = px.scatter(data_frame = ab_data, 
                    x="Content Viewed",
                    y="Website Visits", 
                    size="Website Visits", 
                    color= "Campaign Name", 
                    trendline="ols")
figure.show()

#### Graph 6: Content Viewed vs Added to Cart

In [64]:
figure = px.scatter(data_frame = ab_data, 
                    x="Added to Cart",
                    y="Content Viewed", 
                    size="Added to Cart", 
                    color= "Campaign Name", 
                    trendline="ols")
figure.show()

#### Graph 7: Added to Cart vs Purchases

In [65]:
figure = px.scatter(data_frame = ab_data, 
                    x="Purchases",
                    y="Added to Cart", 
                    size="Purchases", 
                    color= "Campaign Name", 
                    trendline="ols")
figure.show()

# Conclusion

From the above AB tests, the control campaign resulted in higher sales and engagement from the vistors. Visitors were moving from the cart to completing checkout more frequently. However, the conversion rate is higher in the test campaign. The test campaign has higher sales, products viewed, and added to cart.

TLDR;
Test - Use this campaign to market to a specific product or audience
Control - Use this campaign to market multiple products to a wider audience