**Business Problem Understanding**

* Determine the highest associated items which can be strategically placed together and marketed effectively to improve the sales of a retail clothing store.


In [15]:
# Step 1: Define the dataset

dataset = [['T-shirt', 'Trousers', 'Belt'],
	['T-shirt', 'Jacket'],
	['Jacket', 'Gloves'],
	['T-shirt', 'Trousers', 'Jacket'],
	['T-shirt', 'Trousers', 'Sneakers', 'Jacket', 'Belt'],
	['Trousers', 'Sneakers', 'Belt'],
	['Trousers', 'Belt', 'Sneakers'] ]

In [4]:
import pandas as pd

In [5]:
!pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.23.4-py3-none-any.whl.metadata (7.3 kB)
Downloading mlxtend-0.23.4-py3-none-any.whl (1.4 MB)
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   ------- -------------------------------- 0.3/1.4 MB ? eta -:--:--
   --------------- ------------------------ 0.5/1.4 MB 1.5 MB/s eta 0:00:01
   ----------------------- ---------------- 0.8/1.4 MB 1.6 MB/s eta 0:00:01
   ------------------------------- -------- 1.0/1.4 MB 1.5 MB/s eta 0:00:01
   ---------------------------------------- 1.4/1.4 MB 1.3 MB/s eta 0:00:00
Installing collected packages: mlxtend
Successfully installed mlxtend-0.23.4


In [16]:
# Step 2: Use TransactionEncoder to convert data into a DataFrame

from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_array = te.fit_transform(dataset)

In [8]:
df = pd.DataFrame(te_array, columns=te.columns_)
df

Unnamed: 0,Belt,Gloves,Jacket,Sneakers,T-shirt,Trousers
0,True,False,False,False,True,True
1,False,False,True,False,True,False
2,False,True,True,False,False,False
3,False,False,True,False,True,True
4,True,False,True,True,True,True
5,True,False,False,True,False,True
6,True,False,False,True,False,True


In [17]:
# Step 3: Apply Apriori algorithm to find frequent itemsets

from mlxtend.frequent_patterns import apriori
itemset = apriori(df, min_support=0.5, use_colnames=True)
itemset

Unnamed: 0,support,itemsets
0,0.571429,(Belt)
1,0.571429,(Jacket)
2,0.571429,(T-shirt)
3,0.714286,(Trousers)
4,0.571429,"(Belt, Trousers)"


In [18]:
# Step 4: Generate association rules with confidence >= 0.6

from mlxtend.frequent_patterns import association_rules
res = association_rules(itemset, metric="confidence", min_threshold=0.6)
res

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Belt),(Trousers),0.571429,0.714286,0.571429,1.0,1.4,1.0,0.163265,inf,0.666667,0.8,1.0,0.9
1,(Trousers),(Belt),0.714286,0.571429,0.571429,0.8,1.4,1.0,0.163265,2.142857,1.0,0.8,0.533333,0.9


In [19]:
# Step 5: Select key columns from the association rules result

result = res[['antecedents', 'consequents', 'support', 'confidence', 'lift']]
result

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(Belt),(Trousers),0.571429,1.0,1.4
1,(Trousers),(Belt),0.571429,0.8,1.4


**Observations:**

* We can conclude that Belt and Trousers are the highest associated items which can be strategically marketed and placed together as combo to improve the sales.