# Analysis Summary
The data includes customer responses to five initial campaigns and to a final campaign. This analysis evaulates the difference in response rates across the campaigns and the general association between the final campaign and the five initial campaigns. Between campaigns, the results were as follows:
- the final campaign is significantly different from all 5 other campaigns (more customer responses)
- responses to the final campaign were commonly associated with responses to campaigns 1, 4, and 5

In [10]:
# import needed packages
import pandas as pd
from statsmodels.stats.contingency_tables import mcnemar
from mlxtend.frequent_patterns import apriori, association_rules

In [11]:
# read in the data
ifood = pd.read_csv('ifood_df.csv')

In [12]:
# subset the data to only include campaign responses
ifood_campaign = ifood[['AcceptedCmp1', 'AcceptedCmp2', 'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'Response']]

In [13]:
# perform a pairwise proportion test (McNemar's test)
# create pairwise comparisons between each campaign and the final campaign
campaigns = ['AcceptedCmp1', 'AcceptedCmp2', 'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5']
alpha = 0.05

# perform individual mcnemar tests for each campaign pair
for i in campaigns:
    table = pd.crosstab(ifood_campaign[i], ifood_campaign['Response'])
    result = mcnemar(table, exact=True)
    p_value = result.pvalue

    # apply Bonferroni correction (reduce type I error risk)
    p_value_adj = p_value * len(campaigns)

    # print results of each comparison
    print(f"{i} vs. Final Campaign: adjusted p-value={p_value_adj}")

AcceptedCmp1 vs. Final Campaign: adjusted p-value=1.2420643960807265e-27
AcceptedCmp2 vs. Final Campaign: adjusted p-value=1.7884392650263506e-78
AcceptedCmp3 vs. Final Campaign: adjusted p-value=4.7836055223779593e-20
AcceptedCmp4 vs. Final Campaign: adjusted p-value=4.158949524956411e-18
AcceptedCmp5 vs. Final Campaign: adjusted p-value=1.2415450493412101e-22


In [14]:
# association analysis
# get the most frequent campaign response pairs 
frequent_pairs = apriori(ifood_campaign, min_support=0.01, use_colnames=True)

# output the association rules
rules = association_rules(frequent_pairs, num_itemsets = 2205, metric="lift", min_threshold=5.0)

# filter the association rules to only include consequents with the response variable
rules_response = rules[rules['consequents'].apply(lambda x: 'Response' in x)]

# output the final rules table (sorted by confidence)
rules_table = rules_response[['antecedents', 'consequents', 'support', 'confidence', 'lift']].sort_values(by='confidence', ascending=False)
print(rules_table)

                                   antecedents  \
26  (AcceptedCmp4, AcceptedCmp1, AcceptedCmp5)   
16                (AcceptedCmp1, AcceptedCmp5)   
10                (AcceptedCmp1, AcceptedCmp4)   
33                (AcceptedCmp1, AcceptedCmp4)   
30                (AcceptedCmp4, AcceptedCmp5)   
29                (AcceptedCmp1, AcceptedCmp5)   
18                              (AcceptedCmp1)   
19                              (AcceptedCmp5)   
22                              (AcceptedCmp5)   
23                              (AcceptedCmp4)   
12                              (AcceptedCmp1)   
13                              (AcceptedCmp4)   
36                              (AcceptedCmp1)   
34                              (AcceptedCmp5)   
37                              (AcceptedCmp4)   

                               consequents   support  confidence       lift  
26                              (Response)  0.011791    0.838710   5.553618  
16                              (Response) 

