# Chi-Square Test

### Test the Responce b/w Two Different Outcomes

Two Mailers;
- Mailer 1: Basic Mailer
- Mailer 2: Fancy Mailer (More Costly)

In [19]:
import pandas as pd
from scipy.stats import chi2_contingency, chi2

In [20]:
# import data
campaign_data = pd.read_excel('grocery_database.xlsx', sheet_name = 'campaign_data')

In [21]:
# filter data (remove control)
campaign_data = campaign_data.loc[campaign_data['mailer_type'] != 'Control']
campaign_data.head()

Unnamed: 0,customer_id,campaign_name,campaign_date,mailer_type,signup_flag
0,74,delivery_club,2020-07-01,Mailer1,1
1,524,delivery_club,2020-07-01,Mailer1,1
2,607,delivery_club,2020-07-01,Mailer2,1
3,343,delivery_club,2020-07-01,Mailer1,0
4,322,delivery_club,2020-07-01,Mailer2,1


In [22]:
# summarize to get observed frequencies
observed_values = pd.crosstab(campaign_data['mailer_type'], campaign_data['signup_flag']).values # returns array
print(f"Observed Values Matrix:\n{observed_values}")
print('\n')

# calculate signup rates
mailer1_signup_rate = 123 / (252 + 123)
mailer2_signup_rate = 127 / (209 + 127)

print(f'Mailer 1 Sign Up Rate: {round(mailer1_signup_rate, 3)}')
print(f'Mailer 2 Sign Up Rate: {round(mailer2_signup_rate, 3)}')

Observed Values Matrix:
[[252 123]
 [209 127]]


Mailer 1 Sign Up Rate: 0.328
Mailer 2 Sign Up Rate: 0.378


In [23]:
# state hypothesis and set acceptance criteria
H_o = 'There is no relationship between mailer type and signup rate. They are independent.' # null
H_a = 'There is a relationship between type and signup rate. They are not idependent.' # alternate
acceptance_criteria = 0.05

print(f'Null Hypothesis: {H_o}')
print(f'Alternate Hypothesis: {H_a} \n')
print(f'Acceptance Criteria: {acceptance_criteria}')

Null Hypothesis: There is no relationship between mailer type and signup rate. They are independent.
Alternate Hypothesis: There is a relationship between type and signup rate. They are not idependent. 

Acceptance Criteria: 0.05


In [24]:
# calculate chi square statistic and p-value (set Yates correction to false when dof = 1)
chi2_statistic, p_value, dof, expected_value = chi2_contingency(observed_values, correction = False)

print(f'P-Value: {round(p_value, 4)}')
print(f'Degrees of Freedom: {dof}')
print(f'Chi^2 Statistic: {round(chi2_statistic, 4)}')

P-Value: 0.1635
Degrees of Freedom: 1
Chi^2 Statistic: 1.9414


In [25]:
# print results (using p-value)
if p_value <= acceptance_criteria:
    print(f'''As our p-value of {round(p_value, 4)} is lower than our acceptance criteria of {acceptance_criteria}, we reject the null hypothesis, and conclude that: 
    {H_a}.
    ''')
else:
    print(f'''As our p-value of {round(p_value, 4)} is higher than our acceptance criteria of {acceptance_criteria}, we accept the null hypothesis, and conclude that: 
    {H_o}
    ''')

As our p-value of 0.1635 is higher than our acceptance criteria of 0.05, we accept the null hypothesis, and conclude that: 
    There is no relationship between mailer type and signup rate. They are independent.
    


In [26]:
# find the critical value for our test (alternate way of determining test results)
critical_value = chi2.ppf(1 - acceptance_criteria, dof)
print(f'Chi^2 Statistic: {round(chi2_statistic, 4)}')
print(f'Critical Value: {round(critical_value, 4)}')

Chi^2 Statistic: 1.9414
Critical Value: 3.8415


In [27]:
# print results (using chi-square statistic)
if chi2_statistic >= critical_value:
    print(f'''As our chi-square statistic of {round(chi2_statistic, 4)} is higher than our critical value of {round(critical_value, 4)}, we reject the null hypothesis, and conclude that: 
    {H_a}.
    ''')
else:
    print(f'''As our chi-square statistic of {round(chi2_statistic, 4)} is lower than our critical value of {round(critical_value, 4)}, we accept the null hypothesis, and conclude that: 
    {H_o}
    ''')

As our chi-square statistic of 1.9414 is lower than our critical value of 3.8415, we accept the null hypothesis, and conclude that: 
    There is no relationship between mailer type and signup rate. They are independent.
    


Even though the sign up rate for mailer 2 is higher, it appears this difference is not significant given our acceptance criteria. This result does not imply there is no difference, it simply implies the difference is not significant enough to inform a decision to use the more expensive mailers permanently.