# Assessing Campaign Performance using Chi-Square Test for Independence

# Table of contents

- [01. Project Overview](#overview-main)
- [02. Approach](#approach-main)
    - [Import and Understand Data](#import-data)
    - [Establish Hypotheses and Acceptance Criteria](#hypothesis-acceptance)
    - [Calculate Observed and Expected Frequencies](#calculation)
    - [Results & Interpretation](#results)
    - [Discussion & Conclusion](#discussion)
- [03. Concept Overview](#concept-overview)

# Project Overview <a name="overview-main"></a>

Earlier in the year, our client, a grocery retailer, ran a campaign to promote their new "Delivery Club" - an initiative that costs a customer 100 dollars per year for membership, but offers free grocery deliveries rather than the normal cost of 10 dollars per delivery.

For the campaign promoting the club, customers were put randomly into three groups - the first group received a low quality, low cost mailer, the second group received a high quality, high cost mailer, and the third group were a control group, receiving no mailer at all.

The client knows that customers who were contacted, signed up for the Delivery Club at a far higher rate than the control group, but now want to understand if there is a significant difference in signup rate between the cheap mailer and the expensive mailer.  This will allow them to make more informed decisions in the future, with the overall aim of optimising campaign ROI!



# Approach <a name="approach-main"></a>

To do this, we apply Chi-Square Test For Independence (a Hypothesis Test) to assess the performance of two types of mailers that were sent out to promote a new service! 

For this test, as it is focused on comparing the *rates* of two groups - we applied the Chi-Square Test For Independence.  Full details of this test can be found in the dedicated section below.


## Import and understand data <a name="import-data"></a>

In [1]:
import pandas as pd
from scipy.stats import chi2_contingency, chi2

In [2]:
campaign_data = pd.read_excel("C:/Users/Ibiene/OneDrive/DataScience_MachineLearning/Data Science Infinity/Machine Learning/Model Building/data/grocery_database.xlsx", sheet_name = 'campaign_data')

In [3]:
campaign_data.head()

Unnamed: 0,customer_id,campaign_name,campaign_date,mailer_type,signup_flag
0,74,delivery_club,2020-07-01,Mailer1,1
1,524,delivery_club,2020-07-01,Mailer1,1
2,607,delivery_club,2020-07-01,Mailer2,1
3,343,delivery_club,2020-07-01,Mailer1,0
4,322,delivery_club,2020-07-01,Mailer2,1


In [4]:
#check for the different groups in mailer type
campaign_data.mailer_type.value_counts()

Mailer1    375
Mailer2    336
Control    159
Name: mailer_type, dtype: int64

In [5]:
#remove customers who were in the control group
campaign_data = campaign_data[campaign_data.mailer_type != "Control"]

In [6]:
campaign_data.mailer_type.value_counts()

Mailer1    375
Mailer2    336
Name: mailer_type, dtype: int64

In [7]:
pd.pivot_table(campaign_data, index = ['mailer_type'], values = ['customer_id'], aggfunc = 'count')

Unnamed: 0_level_0,customer_id
mailer_type,Unnamed: 1_level_1
Mailer1,375
Mailer2,336


In [8]:
pd.pivot_table(campaign_data, index = ['mailer_type'], values = ['signup_flag'])

Unnamed: 0_level_0,signup_flag
mailer_type,Unnamed: 1_level_1
Mailer1,0.328
Mailer2,0.377976


**Mailer 2 appears to have a higher sign up flag of 37.8% - is that significantly different from Mailer 1 at 32.8% or is it random chance?**

## Establish null, alternate hypotheses, and acceptance criteria <a name="hypothesis-acceptance"></a>

In [9]:
null_hypothesis = "There is no relationship between mailer type and signup rate. They are independent"
alternate_hypothesis = "There is a relationship between mailer type and signup rate. They are not independent"
acceptance_criteria = 0.05

### Calculate observed frequencies and expected frequencies. <a name="calculation"></a>
Note that **observed frequencies** are the true values we've seen, that is, the actual rates per group in the data itself. 
The **expected frequencies** are what we would expect to see based on all of the data combined. 

Expected frequency = (row sum x column sum) / table sum


The code below summarises the dataset to a 2x2 matrix for **signup_flag* by **mailer_type**


In [10]:
observed_values = pd.crosstab(campaign_data.mailer_type, campaign_data.signup_flag).values
observed_values

array([[252, 123],
       [209, 127]], dtype=int64)

In [11]:
chi2_statistic, p_value, dof, expected_values = chi2_contingency(observed_values, correction = False)

**Note** When applying the Chi-Square Test above, we use the parameter *correction = False* which means we are applying what is known as the *Yate's Correction* which is applied when your Degrees of Freedom is equal to one.  This correction helps to prevent overestimation of statistical signficance in this case.


## Results and Interpretation <a name="results"></a>

In [12]:
print(chi2_statistic)

1.9414468614812481


In [13]:
print(p_value)

0.16351152223398197


In [14]:
print(expected_values)

[[243.14345992 131.85654008]
 [217.85654008 118.14345992]]


In [15]:
critical_value = chi2.ppf(1-acceptance_criteria, dof)

In [16]:
print(critical_value)

3.841458820694124


In [17]:
# print results based on p-value
if p_value <= acceptance_criteria:
    print(f"As our p-value of {p_value} is lower than our acceptance_criteria of {acceptance_criteria} - we reject the null hypothesis, and conclude that: {alternate_hypothesis}")
else:
    print(f"As our p-value of {p_value} is higher than our acceptance_criteria of {acceptance_criteria} - we retain the null hypothesis, and conclude that: {null_hypothesis}")


As our p-value of 0.16351152223398197 is higher than our acceptance_criteria of 0.05 - we retain the null hypothesis, and conclude that: There is no relationship between mailer type and signup rate. They are independent


In [18]:
# print results based on chi2-value
if chi2_statistic >= critical_value:
    print(f"As our chi-square statistic of {chi2_statistic} is higher than our critical value of {critical_value} - we reject the null hypothesis, and conclude that: {alternate_hypothesis}")
else:
    print(f"As our chi-square statistic of {chi2_statistic} is lower than our critical value of {critical_value} - we retain the null hypothesis, and conclude that: {null_hypothesis}")
    

As our chi-square statistic of 1.9414468614812481 is lower than our critical value of 3.841458820694124 - we retain the null hypothesis, and conclude that: There is no relationship between mailer type and signup rate. They are independent


## Discussion and Conclusion <a name="discussion"></a>

As we can see from the outputs of these print statements, we do indeed retain the null hypothesis.  We could not find enough evidence that the signup rates for Mailer 1 and Mailer 2 were different - and thus conclude that there was no significant difference.

While we saw that the higher cost Mailer 2 had a higher signup rate (37.8%) than the lower cost Mailer 1 (32.8%) it appears that this difference is not significant, at least at our Acceptance Criteria of 0.05.

Without running this Hypothesis Test, the client may have concluded that they should always look to go with higher cost mailers - and from what we've seen in this test, that may not be a great decision.  It would result in them spending more, but not *necessarily* gaining any extra revenue as a result

Our results here also do not say that there *definitely isn't a difference between the two mailers* - we are only advising that we should not make any rigid conclusions *at this point*.  

Running more A/B Tests like this, gathering more data, and then re-running this test may provide us, and the client more insight!



# Concept Overview  <a name="concept-overview"></a>

## A/B Testing

An A/B Test can be described as a randomised experiment containing two groups, A & B, that receive different experiences. Within an A/B Test, we look to understand and measure the response of each group - and the information from this helps drive future business decisions.

Application of A/B testing can range from testing different online ad strategies, different email subject lines when contacting customers, or testing the effect of mailing customers a coupon, vs a control group.  Companies like Amazon are running these tests in an almost never-ending cycle, testing new website features on randomised groups of customers...all with the aim of finding what works best so they can stay ahead of their competition.  Reportedly, Netflix will even test different images for the same movie or show, to different segments of their customer base to see if certain images pull more viewers in.


## Hypothesis Testing

A Hypothesis Test is used to assess the plausibility, or likelihood of an assumed viewpoint based on sample data - in other words, a it helps us assess whether a certain view we have about some data is likely to be true or not.

There are many different scenarios we can run Hypothesis Tests on, and they all have slightly different techniques and formulas - however they all have some shared, fundamental steps & logic that underpin how they work.


**The Null Hypothesis**

In any Hypothesis Test, we start with the Null Hypothesis. The Null Hypothesis is where we state our initial viewpoint, and in statistics, and specifically Hypothesis Testing, our initial viewpoint is always that the result is purely by chance or that there is no relationship or association between two outcomes or groups


**The Alternate Hypothesis**

The aim of the Hypothesis Test is to look for evidence to support or reject the Null Hypothesis.  If we reject the Null Hypothesis, that would mean we’d be supporting the Alternate Hypothesis.  The Alternate Hypothesis is essentially the opposite viewpoint to the Null Hypothesis - that the result is *not* by chance, or that there *is* a relationship between two outcomes or groups


**The Acceptance Criteria**

In a Hypothesis Test, before we collect any data or run any numbers - we specify an Acceptance Criteria.  This is a p-value threshold at which we’ll decide to reject or support the null hypothesis.  It is essentially a line we draw in the sand saying "if I was to run this test many many times, what proportion of those times would I want to see different results come out, in order to feel comfortable, or confident that my results are not just some unusual occurrence"

Conventionally, we set our Acceptance Criteria to 0.05 - but this does not have to be the case.  If we need to be more confident that something did not occur through chance alone, we could lower this value down to something much smaller, meaning that we only come to the conclusion that the outcome was special or rare if it’s extremely rare.

So to summarise, in a Hypothesis Test, we test the Null Hypothesis using a p-value and then decide it’s fate based on the Acceptance Criteria.


**Types Of Hypothesis Test**

There are many different types of Hypothesis Tests, each of which is appropriate for use in differing scenarios - depending on a) the type of data that you’re looking to test and b) the question that you’re asking of that data.

In the case of our task here, where we are looking to understand the difference in sign-up *rate* between two groups - we will utilise the Chi-Square Test For Independence.


## Chi-Square Test For Independence

The Chi-Square Test For Independence is a type of Hypothesis Test that assumes observed frequencies for categorical variables will match the expected frequencies.

The *assumption* is the Null Hypothesis, which as discussed above is always the viewpoint that the two groups will be equal.  With the Chi-Square Test For Independence we look to calculate a statistic which, based on the specified Acceptance Criteria will mean we either reject or support this initial assumption.

The *observed frequencies* are the true values that we’ve seen.

The *expected frequencies* are essentially what we would *expect* to see based on all of the data.

**Note:** Another option when comparing "rates" is a test known as the *Z-Test For Proportions*.  While, we could absolutely use this test here, we have chosen the Chi-Square Test For Independence because:

* The resulting test statistic for both tests will be the same
* The Chi-Square Test can be represented using 2x2 tables of data - meaning it can be easier to explain to stakeholders
* The Chi-Square Test can extend out to more than 2 groups - meaning the business can have one consistent approach to measuring signficance

___

