# Programming Assignment #5 

## Association rule mining 

The market basket transactions dataset (transactions_data.txt)contains list of items purchased by customer in each transaction.

- load the transaction dataset file
- use minimum support = 0.2 and use_colname=True in apriori method 
- select metric as confidence in association rules
- use minimum threshold = 0.5

Ex: If the minimum support is 0.4, the metric is confidence and minimum threshold is 0.5 then some of the outputs are: 
- the least frequency of frequent 1-itemset is ['Queso'].
- the support, confidence, and lift of rule, ['Queso'] -> ['Tortilla chips'] are:
  - consequent support = 0.7
  - support = 0.4
  - confidence = 1.00
  - lift = 1.42

In [1]:
# Import the packages 
import numpy as np

In [4]:
%pip install mlxtend


Collecting mlxtend
  Downloading mlxtend-0.23.1-py3-none-any.whl.metadata (7.3 kB)
Downloading mlxtend-0.23.1-py3-none-any.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: mlxtend
Successfully installed mlxtend-0.23.1
Note: you may need to restart the kernel to use updated packages.


In [2]:
#load the transactions dataset 
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Loading the data
def load_dataset(path_to_data):
    transactions = []
    with open(path_to_data, 'r') as fid:
        for line in fid:
            transaction = line.strip().split(',')
            transactions.append(transaction)
    return transactions

path_to_data = "transactions_data.txt"  
dataset = load_dataset(path_to_data)
dataset

[['Lime', 'Queso', 'Salsa', 'Salt', 'Tortilla chips'],
 ['Ranch dip', 'Salsa', 'Tortilla chips'],
 ['Queso', 'Tortilla chips'],
 ['Potato chips', 'Ranch dip'],
 ['Salsa', 'Tortilla chips'],
 ['Queso', 'Salsa', 'Tortilla chips'],
 ['Pita chips', 'Ranch dip'],
 ['Guacamole', 'Tortilla chips'],
 ['Guacamole', 'Queso', 'Salsa', 'Tortilla chips'],
 ['Pita chips', 'Salsa']]

In [3]:
# Transform the data to a format suitable for the apriori function
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Apply the apriori algorithm
frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)  
print("Frequent Itemsets:")
print(frequent_itemsets)

# Generate the association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
    support                        itemsets
0       0.2                     (Guacamole)
1       0.2                    (Pita chips)
2       0.4                         (Queso)
3       0.3                     (Ranch dip)
4       0.6                         (Salsa)
5       0.7                (Tortilla chips)
6       0.2     (Guacamole, Tortilla chips)
7       0.3                  (Salsa, Queso)
8       0.4         (Tortilla chips, Queso)
9       0.5         (Salsa, Tortilla chips)
10      0.3  (Salsa, Tortilla chips, Queso)

Association Rules:
                antecedents              consequents  antecedent support  \
0               (Guacamole)         (Tortilla chips)                 0.2   
1                   (Salsa)                  (Queso)                 0.6   
2                   (Queso)                  (Salsa)                 0.4   
3          (Tortilla chips)                  (Queso)                 0.7   
4                   (Queso)         (Tortilla chips) 