# Association Rules

#### In this section, we will learn about how to use assocition rules in python and how to filter data based on different metrics of association rules.


Following libraries are used for association rules:
- pandas
- numpy
- matplotlib
- mlxtend

In [1]:
# Import necessary modules

import numpy as np
import pandas as pd
import csv
from matplotlib import pyplot as plt

# Import FP-growth and Apriori modules, TransactionEncoder module and association module from mlxtend

from mlxtend.frequent_patterns import apriori
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import association_rules as arule
from mlxtend.frequent_patterns import fpgrowth

### Read data from Repair.csv

We use Repair.csv file as a data set for finding frequent item sets. FP-growth and Apriori algorithms are used for finding frequent itemsets.


In [2]:
# Read file 'Repair.csv' and change the data format for applying algorithms

data_set = []

with open("Repair.csv") as csvFile:
    reader = csv.reader(csvFile)
    for row in reader:
        data_set.append(row)


### FP-grwoth algorithm

We use Repair.csv file as a data set for finding frequent item sets. FP-growth algorithm is used for finding frequent itemsets.

In [3]:
# learn to use TransactionEncoder module to convert an array to DataFrame for FP-growth algorithm in mlxtend

te = TransactionEncoder()
te_ary = te.fit(data_set).transform(data_set)
data = pd.DataFrame(te_ary, columns = te.columns_)
data.tail(5)




Unnamed: 0,Analyze Defect,Archive Repair,Inform User,Register,Repair (Complex),Repair (Simple),Restart Repair,Test Repair
1099,True,True,True,True,True,False,False,True
1100,True,True,True,True,True,False,False,True
1101,True,True,True,True,True,False,False,True
1102,True,True,True,True,True,False,False,True
1103,True,True,True,True,False,True,True,True


In [None]:
frequent_itemsets=fpgrowth(data, min_support=0.3, use_colnames=True)
print(frequent_itemsets)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets_filtered = frequent_itemsets.loc[(frequent_itemsets['length'] > 3) & (frequent_itemsets['support'] > 0.3)]   
frequent_itemsets_filtered

### Apriori algorithm

We use Repair.csv file as a data set for finding frequent item sets. Apriori algorithm is used to find frequent itemsets.

In [None]:
# learn to use TransactionEncoder module to convert an array to DataFrame for Apriori algorithm in mlxtend

te = TransactionEncoder()
te_ary = te.fit(data_set).transform(data_set)
data = pd.DataFrame(te_ary, columns = te.columns_)
data.tail(5)


In [None]:
frequent_itemsets = apriori(data, min_support = 0.3, use_colnames = True)

### Filtering data based on metrics of association rules
In python you can filter frequent itemsets based on different metrics such as support, confidence, and lift.

In [None]:
# learn to use the association rule algorithm from mlxtend and filter data based on one metric.

rules_association =arule(frequent_itemsets, metric = 'lift', min_threshold = 0.8)
rules_association

#### Question: Change the metric to lift and support. Investigate the effect of that on the table.

### Finding qualified frequent itemsets using association rules
You can use association rules to find qualified itemsets for different datasets.

#### Question:  Find frequent item sets with minimum support of 0.2. Store them in frequent_itemsets variable.


In [None]:
#Answer
frequent_itemsets = apriori(data, min_support = 0.2, use_colnames = True)
frequent_itemsets

### Filtering itemsets based on length and metrics of assosciation rules
In this section, you will learn how to filter frequent item sets based on length of them.

In [None]:
# Add another column named 'length' in 'frequent_itemsets' which indicates the number of items in each frequent itemset.

frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))

# Filter out the frequent itemsets which have a length longer than 2 and a support bigger than 0.3. 

# Store these found itemsets in variable 'frequent_itemsets_filtered'.

frequent_itemsets_filtered = frequent_itemsets.loc[(frequent_itemsets['length'] > 2) & (frequent_itemsets['support'] > 0.3)]   
frequent_itemsets_filtered

### Demonstrating selective metrics of association rules in one table
 In this section, you will learn how to show selective metrics of association rules in one table.

In [None]:
# Mine association rules from the discovered frequent itemsets stored in variable 'frequent_itemsets', set minimum confidence to 0.5.

# Store the discovered rules in variable 'rules_association'.

rules_association =arule(frequent_itemsets, metric = 'confidence', min_threshold = 0.5)

# Filter out the rules with lift larger than 1 and support larger than 0.4, store the discovered rules in variable 'filtered_rules'.

filtered_rules = rules_association.loc[(rules_association['lift'] > 1) & (rules_association['support'] > 0.4)]     

# Show the columns 'antecedents', 'consequents', 'support', 'confidence' and 'lift' of variable 'filtered_rules' 

filtered_rules[['support', 'confidence', 'lift']]