# Programming Assignment #5 

## Association rule mining 

The market basket transactions dataset (transactions_data.txt)contains list of items purchased by customer in each transaction.

- load the transaction dataset file
- use minimum support = 0.2 and use_colname=True in apriori method 
- select metric as confidence in association rules
- use minimum threshold = 0.5

Ex: If the minimum support is 0.4, the metric is confidence and minimum threshold is 0.5 then some of the outputs are: 
- the least frequency of frequent 1-itemset is ['Queso'].
- the support, confidence, and lift of rule, ['Queso'] -> ['Tortilla chips'] are:
  - consequent support = 0.7
  - support = 0.4
  - confidence = 1.00
  - lift = 1.42

In [8]:
# Import the packages 
import numpy as np

In [9]:

# Load the transactions dataset 
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Loading the data
def load_dataset(path_to_data):
    transactions = []
    with open(path_to_data, 'r') as fid:
        for line in fid:
            transactions.append(line.strip().split(','))
    return transactions

path_to_data = "transactions_data.txt"  
dataset = load_dataset(path_to_data)
dataset


[['Lime', 'Queso', 'Salsa', 'Salt', 'Tortilla chips'],
 ['Ranch dip', 'Salsa', 'Tortilla chips'],
 ['Queso', 'Tortilla chips'],
 ['Potato chips', 'Ranch dip'],
 ['Salsa', 'Tortilla chips'],
 ['Queso', 'Salsa', 'Tortilla chips'],
 ['Pita chips', 'Ranch dip'],
 ['Guacamole', 'Tortilla chips'],
 ['Guacamole', 'Queso', 'Salsa', 'Tortilla chips'],
 ['Pita chips', 'Salsa']]

In [10]:

# Transform the data to a format suitable for the apriori function
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Apply the apriori algorithm with minimum support of 0.4
frequent_itemsets = apriori(df, min_support=0.4, use_colnames=True)
print("Frequent Itemsets:")
print(frequent_itemsets)

# Generate the association rules with confidence threshold of 0.5
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
print("\nAssociation Rules:")
print(rules)


Frequent Itemsets:
   support                 itemsets
0      0.4                  (Queso)
1      0.6                  (Salsa)
2      0.7         (Tortilla chips)
3      0.4  (Queso, Tortilla chips)
4      0.5  (Salsa, Tortilla chips)

Association Rules:
        antecedents       consequents  antecedent support  consequent support  \
0           (Queso)  (Tortilla chips)                 0.4                 0.7   
1  (Tortilla chips)           (Queso)                 0.7                 0.4   
2           (Salsa)  (Tortilla chips)                 0.6                 0.7   
3  (Tortilla chips)           (Salsa)                 0.7                 0.6   

   support  confidence      lift  leverage  conviction  zhangs_metric  
0      0.4    1.000000  1.428571      0.12         inf       0.500000  
1      0.4    0.571429  1.428571      0.12         1.4       1.000000  
2      0.5    0.833333  1.190476      0.08         1.8       0.400000  
3      0.5    0.714286  1.190476      0.08         