# Analyzing Promotional Effectiveness with A/B Testing

To increase customer acquisition for "Delivery Club" membership of a Retail store, we test the impact of different promotional mailers campaign on signup rates. Two mailer designs—basic (Mailer 1) and premium (Mailer 2)—were evaluated against a control group that received no mailer. The goal is to determine the effectiveness of the mailers and assess whether the premium version justifies its higher cost. Identifying this difference is crucial for optimizing marketing expenditures and ensuring cost-effective customer acquisition strategies.

For this business challenge, we conduct an A/B Test, a statistical approach for comparing two variations (Mailer 1 and Mailer 2) by analyzing the differences in their signup rates. This method ensures a data-driven conclusion on whether the premium mailer significantly outperforms the basic version.

### A/B Test

An A/B Test is a randomised experiment containing two groups, A & B that receive different experiences.
Within an A/B Test, we look to understand and measure the response of each group.

### Import required packages

In [5]:
import pandas as pd
from scipy.stats import chi2_contingency, chi2 

`chi2_contingency` allows us to compute p-values and the chi square statistics

`chi2` (chi square) will allow us to find the critical value on our acceptance criteria

### Importing and overview of the data

In [8]:
#Importing the data
campaign_data = pd.read_excel("retail_store_database.xlsx", sheet_name= "campaign_data")

#Overview of the data
campaign_data.info()
campaign_data.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 870 entries, 0 to 869
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   customer_id    870 non-null    int64         
 1   campaign_name  870 non-null    object        
 2   campaign_date  870 non-null    datetime64[ns]
 3   mailer_type    870 non-null    object        
 4   signup_flag    870 non-null    int64         
dtypes: datetime64[ns](1), int64(2), object(2)
memory usage: 34.1+ KB


Unnamed: 0,customer_id,campaign_name,campaign_date,mailer_type,signup_flag
0,74,delivery_club,2020-07-01,Mailer1,1
1,524,delivery_club,2020-07-01,Mailer1,1
2,607,delivery_club,2020-07-01,Mailer2,1
3,343,delivery_club,2020-07-01,Mailer1,0
4,322,delivery_club,2020-07-01,Mailer2,1


### Filtering the data

Mailer type "Control" is excluded as the customers in this mailer category did not recieve any mails. 

In [15]:
campaign_data = campaign_data.loc[campaign_data["mailer_type"] != "Control"]
campaign_data

Unnamed: 0,customer_id,campaign_name,campaign_date,mailer_type,signup_flag
0,74,delivery_club,2020-07-01,Mailer1,1
1,524,delivery_club,2020-07-01,Mailer1,1
2,607,delivery_club,2020-07-01,Mailer2,1
3,343,delivery_club,2020-07-01,Mailer1,0
4,322,delivery_club,2020-07-01,Mailer2,1
...,...,...,...,...,...
863,765,delivery_club,2020-07-01,Mailer2,1
864,466,delivery_club,2020-07-01,Mailer1,1
865,372,delivery_club,2020-07-01,Mailer2,1
866,104,delivery_club,2020-07-01,Mailer1,1


### Summarize to get the observed frequency

In [19]:
observed_values = pd.crosstab(campaign_data["mailer_type"], campaign_data["signup_flag"]).values
print(observed_values)

[[252 123]
 [209 127]]


`crosstab` will help to create a 2x2 matrix required for the chi square test. 
`.values` will return the output as an array

In [22]:
mailer1_signup_rate = round(observed_values[0][1]/ (observed_values[0][0] + observed_values[0][1]), 3)
mailer2_signup_rate = round(observed_values[1][1]/ (observed_values[1][0] + observed_values[1][1]), 3)
print(f"Customer sign-up rate for Mailer1: {mailer1_signup_rate}\nCustomer sign-up rate for Mailer2: {mailer2_signup_rate}")

Customer sign-up rate for Mailer1: 0.328
Customer sign-up rate for Mailer2: 0.378


### State hypothesis & set acceptance criteria

In [25]:
null_hypothesis = "There is no relationship between mailer type and the signup rate. They are independent."
alternative_hypothesis = "There is a relationship between mailer type and the signup rate. They are not independent."
acceptance_criteria = 0.05

### Calculate the expected frequencies & chi-square statistic

In [28]:
chi2_statistic, p_value, dof, expected_values = chi2_contingency(observed_values, correction= False,)
print(chi2_statistic, p_value, dof, expected_values)

1.9414468614812481 0.16351152223398197 1 [[243.14345992 131.85654008]
 [217.85654008 118.14345992]]


`correction` refers to Yates correction, if the degrees of freedom are equal to 1, and in this case we are using 2x2 matrix,
we are setting Yates correction to false

if p_value is > acceptance criteria (0.05) then we accept/ retain the the null hypothesis.

if p_value is < acceptance criteria (0.05) then we reject the null hypothesis.

### Finding the critical value for the chi-square test

In [32]:
critical_value = chi2.ppf(1 - acceptance_criteria, dof)
print(critical_value)

3.841458820694124


`ppf` is percentage point function, it will find the critical value along the chi-squared distribuition based on the acceptance criteria.

### Printing the results (Chi-square statistic)

In [36]:
if chi2_statistic >= critical_value:
    print(f"As our chi-square statistic of {chi2_statistic} is higher than our critical value of {critical_value} - we reject the null hypothesis, and conclude that: {alternative_hypothesis}")
else:
    print(f"As our chi-square statistic of {chi2_statistic} is lower than our critical value of {critical_value} - we retain the null hypothesis, and conclude that: {null_hypothesis}")

As our chi-square statistic of 1.9414468614812481 is lower than our critical value of 3.841458820694124 - we retain the null hypothesis, and conclude that: There is no relationship between mailer type and the signup rate. They are independent.


### Printing the results (p-value)

In [39]:
if p_value <= acceptance_criteria:
    print(f"As our p-value of {p_value} is lower than our acceptance criteria of {acceptance_criteria} - we reject the null hypothesis, and conclude that: {alternative_hypothesis}")
else:
    print(f"As our p-value of {p_value} is higher than our acceptance criteria of {acceptance_criteria} - we retain the null hypothesis, and conclude that: {null_hypothesis}")

As our p-value of 0.16351152223398197 is higher than our acceptance criteria of 0.05 - we retain the null hypothesis, and conclude that: There is no relationship between mailer type and the signup rate. They are independent.


### Interpreting the test result

The analysis revealed that while the signup rate for the premium mailer (Mailer 2) was slightly higher than the basic mailer (Mailer 1), the difference was not statistically significant based on the predefined acceptance criteria (p-value < 0.05). This indicates that the marketing team should avoid drawing strong conclusions about the superiority of the premium mailer. Without the test, the business might have assumed the premium mailer to be more effective, potentially leading to unnecessary expenses without delivering significant additional revenue over time.

The findings guide the marketing team to adopt a more cost-conscious approach, optimizing resources without compromising customer acquisition goals. This project showcases the practical application of A/B testing to align marketing efforts with business objectives and ensure sustainable revenue growth.