# Analysis Summary
The data includes customer responses to five initial campaigns and to a final campaign. This analysis evaulates the difference in response rates across the campaigns and the general association between the final campaign and the five initial campaigns. Between campaigns, the results were as follows:
1. campaign 2 is significantly different from all other campaigns (less customer responses)
2. the final campaign is significantly different from all 5 other campaigns (more customer responses)
3. campaigns 1, 3, 4, and 5 are not signifcantly different from each other in terms of customer responses

In terms of associations, responses to the final campaign were commonly associated with responses to campaigns 1, 4, and 5.

In [1]:
# import needed packages
import pandas as pd
from statsmodels.stats.contingency_tables import cochrans_q
from itertools import combinations
from statsmodels.stats.contingency_tables import mcnemar
import statsmodels.stats.multitest as smm
from mlxtend.frequent_patterns import apriori, association_rules

In [2]:
# read in the data
ifood = pd.read_csv('ifood_df.csv')

In [3]:
# subset the data to only include campaign responses
ifood_campaign = ifood[['AcceptedCmp1', 'AcceptedCmp2', 'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'Response']]

# Cochran's Q test
result = cochrans_q(ifood_campaign)
print(result)
# at lest one campaign has a significantly different response rate

df          5
pvalue      4.2306049840149846e-79
statistic   376.08028807682047


In [4]:
# perform a post-hoc pairwise McNemar's test
# create pairwise comparisons between each campaign
pairs = list(combinations(ifood_campaign.columns, 2))
alpha = 0.05

# perform individual mcnemar tests for each campaign pair
for pair in pairs:
    table = pd.crosstab(ifood_campaign[pair[0]], ifood_campaign[pair[1]])
    result = mcnemar(table, exact=True)
    p_value = result.pvalue

    # apply Bonferroni correction (reduce type I error risk)
    p_value_adj = p_value * len(pairs)
    adjusted_alpha = alpha/len(pairs)

    # print results of each significant comparison
    if p_value_adj < adjusted_alpha :
        print(f"{pair}: adjusted p-value={p_value_adj}")

('AcceptedCmp1', 'AcceptedCmp2'): adjusted p-value=2.5656210184884974e-21
('AcceptedCmp1', 'Response'): adjusted p-value=3.7261931882421794e-27
('AcceptedCmp2', 'AcceptedCmp3'): adjusted p-value=2.6469704666496385e-24
('AcceptedCmp2', 'AcceptedCmp4'): adjusted p-value=1.1699925508555891e-31
('AcceptedCmp2', 'AcceptedCmp5'): adjusted p-value=6.117130978207308e-28
('AcceptedCmp2', 'Response'): adjusted p-value=5.365317795079052e-78
('AcceptedCmp3', 'Response'): adjusted p-value=1.4350816567133877e-19
('AcceptedCmp4', 'Response'): adjusted p-value=1.2476848574869235e-17
('AcceptedCmp5', 'Response'): adjusted p-value=3.724635148023631e-22


In [5]:
# association analysis
# get the most frequent campaign response pairs 
frequent_pairs = apriori(ifood_campaign, min_support=0.01, use_colnames=True)

# output the association rules
rules = association_rules(frequent_pairs, num_itemsets = 2205, metric="lift", min_threshold=5.0)

# filter the association rules to only include consequents with the response variable
rules_response = rules[rules['consequents'].apply(lambda x: 'Response' in x)]

# output the final rules table (sorted by confidence)
rules_table = rules_response[['antecedents', 'consequents', 'support', 'confidence', 'lift']].sort_values(by='confidence', ascending=False)
print(rules_table)

                                   antecedents  \
27  (AcceptedCmp1, AcceptedCmp4, AcceptedCmp5)   
16                (AcceptedCmp1, AcceptedCmp5)   
10                (AcceptedCmp1, AcceptedCmp4)   
31                (AcceptedCmp1, AcceptedCmp4)   
33                (AcceptedCmp4, AcceptedCmp5)   
32                (AcceptedCmp1, AcceptedCmp5)   
18                              (AcceptedCmp1)   
19                              (AcceptedCmp5)   
23                              (AcceptedCmp5)   
22                              (AcceptedCmp4)   
12                              (AcceptedCmp1)   
13                              (AcceptedCmp4)   
35                              (AcceptedCmp1)   
37                              (AcceptedCmp5)   
36                              (AcceptedCmp4)   

                               consequents   support  confidence       lift  
27                              (Response)  0.011791    0.838710   5.553618  
16                              (Response) 

