Part 2: Finding interesting relationships between product groups
Use association rule mining to find interesting relationships between product groups.

Your report should include a clear recommendation on how the company should use the results of the association rule mining to increase its revenue.

In [39]:
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

df2 = pd.read_csv('../data/drone_prod_groups.csv')
df2

Unnamed: 0,ID,Prod1,Prod2,Prod3,Prod4,Prod5,Prod6,Prod7,Prod8,Prod9,...,Prod11,Prod12,Prod13,Prod14,Prod15,Prod16,Prod17,Prod18,Prod19,Prod20
0,1,0,0,0,0,0,0,0,0,1,...,0,0,0,0,1,0,0,0,0,1
1,2,0,1,0,0,0,0,0,0,1,...,0,0,0,0,1,1,1,1,1,1
2,3,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,1,1
3,4,1,0,0,1,0,0,0,0,0,...,1,0,0,0,0,0,0,0,1,1
4,5,0,0,0,0,0,0,0,0,1,...,0,0,0,0,1,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99995,99996,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
99996,99997,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
99997,99998,0,1,0,0,0,0,0,0,1,...,0,0,0,0,1,0,0,1,0,0
99998,99999,0,1,1,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,1,1


## Data preparation

mlextend requires the data to be boolean format before applying the Apriori algorithm. also we can drop the ID columns, as it is not needed for the analysis

In [40]:
# drop id
df2 = df2.drop(columns=['ID'])

# replace 1 values with True and 0 with False
df2 = df2.astype(bool)


Now the dataset is ready for the Apriori algorithm. We can apply the algorithm to find the frequent itemsets and generate association rules

## Finding frequent itemsets

In [41]:
# find frequent itemsets
frequent_itemsets = apriori(df2, min_support=0.1, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.10998,(Prod1)
1,0.13098,( Prod2)
2,0.10459,( Prod5)
3,0.13499,( Prod7)
4,0.16179,( Prod8)
5,0.19853,( Prod9)
6,0.10848,( Prod11)
7,0.15971,( Prod12)
8,0.14557,( Prod14)
9,0.1188,( Prod15)


In [42]:
# generate association rules
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.5)

# sort in descending order of confidence
rules = rules.sort_values(by='confidence', ascending=False)

rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,( Prod15),( Prod9),0.1188,0.19853,0.11145,0.938131,4.725388,1.0,0.087865,12.954372,0.894663,0.541335,0.922806,0.749754
3,( Prod20),( Prod19),0.14798,0.20626,0.13476,0.910664,4.415125,1.0,0.104238,8.884845,0.907849,0.613997,0.887449,0.782007
2,( Prod19),( Prod20),0.20626,0.14798,0.13476,0.65335,4.415125,1.0,0.104238,2.457869,0.974508,0.613997,0.593144,0.782007
1,( Prod9),( Prod15),0.19853,0.1188,0.11145,0.561376,4.725388,1.0,0.087865,2.009011,0.983664,0.541335,0.502243,0.749754


## Evaluation

Lift is a metric that tells how many times more likely the items are to be purchased together than if they were independent.

- Products {15, 9} have a lift of 4.7.
- Products {20, 19} have a lift of 4.4.

### Recommendation

- To increase revenue the company should offer combo deals containing {Prod 15, Prod 9} and {Prod 20, Prod 19}
- Place these products near each other to make joint purchases more convenient.