# Apriori


The Apriori algorithm is used for mining frequent itemsets and devising association rules from a transactional database. The parameters “support” and “confidence” are used. Support refers to items’ frequency of occurrence; confidence is a conditional probability.

A key concept in Apriori algorithm is the anti-monotonicity of the support measure. It assumes that

1. All subsets of a frequent itemset must be frequent
2. Similarly, for any infrequent itemset, all its supersets must be infrequent too


###  Algorithm
The following are the main steps of the algorithm:

1. Calculate the support of item sets (of size k = 1) in the transactional database (note that support is the frequency of 
   occurrence of an itemset). This is called generating the candidate set.
2. Prune the candidate set by eliminating items with a support less than the given threshold.
3. Join the frequent itemsets to form sets of size k + 1, and repeat the above sets until no more itemsets can be formed. This 
   will happen when the set(s) formed have a support less than​ the given support.

### Libraries useful in Apriori are listed below

### Install library for apriori algorithm using:
!pip install mlxtend

In [10]:
import warnings
warnings.filterwarnings('ignore')
!pip3 install mlxtend



### Load the "basket" data

In [11]:
# Load dataset and display first five rows.
import pandas as pd
from sklearn import preprocessing
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
df1 = pd.read_csv('BASKETS1n')
df1.head()

Unnamed: 0,cardid,value,pmethod,sex,homeown,income,age,fruitveg,freshmeat,dairy,cannedveg,cannedmeat,frozenmeal,beer,wine,softdrink,fish,confectionery
0,39808,42.7123,CHEQUE,M,NO,27000,46,F,T,T,F,F,F,F,F,F,F,T
1,67362,25.3567,CASH,F,NO,30000,28,F,T,F,F,F,F,F,F,F,F,T
2,10872,20.6176,CASH,M,NO,13200,36,F,F,F,T,F,T,T,F,F,T,F
3,26748,23.6883,CARD,F,NO,12200,26,F,F,T,F,F,F,F,T,F,F,F
4,91609,18.8133,CARD,M,YES,11000,24,F,F,F,F,F,F,F,F,F,F,F


### Perform pre-processing (if required)

In [12]:
#selecting only products columns and replacing boolean values
df = df1[df1.columns[7:]]
le = preprocessing.LabelEncoder()
df.fruitveg = le.fit_transform(df.fruitveg)
df.freshmeat = le.fit_transform(df.freshmeat)
df.dairy = le.fit_transform(df.dairy)
df.cannedveg = le.fit_transform(df.cannedveg)
df.cannedmeat = le.fit_transform(df.cannedmeat)
df.frozenmeal = le.fit_transform(df.frozenmeal)
df.beer = le.fit_transform(df.beer)
df.wine = le.fit_transform(df.wine)
df.softdrink = le.fit_transform(df.softdrink)
df.fish = le.fit_transform(df.fish)
df.confectionery = le.fit_transform(df.confectionery)
df.head()
#le.fit(["T","F"])
#for i in range(len(products)):
#    le.transform(products.iloc[i])
#print(products)
#columnnames = df.columns
#for i in range(len(products)):
#    products[i] = products[i].astype(int)
#products.head()

Unnamed: 0,fruitveg,freshmeat,dairy,cannedveg,cannedmeat,frozenmeal,beer,wine,softdrink,fish,confectionery
0,0,1,1,0,0,0,0,0,0,0,1
1,0,1,0,0,0,0,0,0,0,0,1
2,0,0,0,1,0,1,1,0,0,1,0
3,0,0,1,0,0,0,0,1,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0


### Q1. Find frequent itemsets in the dataset using Apriori

In [13]:
#apriori with min support 0.1 and confidence 0.1
associations = apriori(df,min_support = 0.1) 

### Q2. Find the assoiation rules in the dataset having min confidence 10%

In [14]:
# find rules
rules = association_rules(associations, metric="confidence", min_threshold=0.1)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(0),(9),0.299,0.292,0.145,0.48495,1.660787,0.057692,1.374623
1,(9),(0),0.292,0.299,0.145,0.496575,1.660787,0.057692,1.392463
2,(3),(5),0.303,0.302,0.173,0.570957,1.890586,0.081494,1.626877
3,(5),(3),0.302,0.303,0.173,0.572848,1.890586,0.081494,1.631736
4,(3),(6),0.303,0.293,0.167,0.551155,1.881075,0.078221,1.575154
5,(6),(3),0.293,0.303,0.167,0.569966,1.881075,0.078221,1.620802
6,(5),(6),0.302,0.293,0.17,0.562914,1.921208,0.081514,1.61753
7,(6),(5),0.293,0.302,0.17,0.580205,1.921208,0.081514,1.662715
8,(10),(7),0.276,0.287,0.144,0.521739,1.817906,0.064788,1.490818
9,(7),(10),0.287,0.276,0.144,0.501742,1.817906,0.064788,1.453063


### Q3. Find association rules having minimum antecedent_len 2 & confidence greater than 0.75

In [15]:
#rules having minimum antecedent_len 2 and confidence greater than 0.75
rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))
print(rules[(rules['antecedent_len'] >= 2) & (rules['confidence'] > 0.75)])

   antecedents consequents  antecedent support  consequent support  support  \
10      (3, 5)         (6)               0.173               0.293    0.146   
11      (3, 6)         (5)               0.167               0.302    0.146   
12      (5, 6)         (3)               0.170               0.303    0.146   

    confidence      lift  leverage  conviction  antecedent_len  
10    0.843931  2.880309  0.095311    4.530037               2  
11    0.874251  2.894873  0.095566    5.550762               2  
12    0.858824  2.834401  0.094490    4.937083               2  


### Load the "zoo" data

In [16]:
# load the dataset and display first five rows
zoo = pd.read_csv('Zoo Dataset/zoo.data')
zoo.columns = ["animal name","hair","feathers","eggs","milk","airborne","aquatic","predator","toothed","backbone","breathes","venomous","fins","legs","tail","domestic","catsize","type"]
zoo.head()

Unnamed: 0,animal name,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize,type
0,antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1
1,bass,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4
2,bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
3,boar,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1
4,buffalo,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1


### Q4. Perform pre-processing (if required)

In [17]:
#dropping first column - name
#one hot encoding column legs
#replacing class type and one hot encoding it
df_zoo = zoo[zoo.columns[1:]] 

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['legs'], prefix='LEGS')],axis=1)
df_zoo.drop(['legs'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['hair'], prefix='HAIR')],axis=1)
df_zoo.drop(['hair'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['feathers'], prefix='FEATHERS')],axis=1)
df_zoo.drop(['feathers'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['eggs'], prefix='EGGS')],axis=1)
df_zoo.drop(['eggs'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['milk'], prefix='MILK')],axis=1)
df_zoo.drop(['milk'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['airborne'], prefix='AIRBORNE')],axis=1)
df_zoo.drop(['airborne'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['aquatic'], prefix='AQUATIC')],axis=1)
df_zoo.drop(['aquatic'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['toothed'], prefix='TOOTHED')],axis=1)
df_zoo.drop(['toothed'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['predator'], prefix='PREDATOR')],axis=1)
df_zoo.drop(['predator'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['backbone'], prefix='BACKBONE')],axis=1)
df_zoo.drop(['backbone'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['breathes'], prefix='BREATHES')],axis=1)
df_zoo.drop(['breathes'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['venomous'], prefix='VENOMOUS')],axis=1)
df_zoo.drop(['venomous'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['fins'], prefix='FINS')],axis=1)
df_zoo.drop(['fins'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['tail'], prefix='TAIL')],axis=1)
df_zoo.drop(['tail'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['domestic'], prefix='DOMESTIC')],axis=1)
df_zoo.drop(['domestic'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['catsize'], prefix='CATSIZE')],axis=1)
df_zoo.drop(['catsize'],axis=1, inplace=True)

df_zoo = pd.concat([df_zoo,pd.get_dummies(df_zoo['type'], prefix='TYPE')],axis=1)
df_zoo.drop(['type'],axis=1, inplace=True)


df_zoo.head()

Unnamed: 0,LEGS_0,LEGS_2,LEGS_4,LEGS_5,LEGS_6,LEGS_8,HAIR_0,HAIR_1,FEATHERS_0,FEATHERS_1,...,DOMESTIC_1,CATSIZE_0,CATSIZE_1,TYPE_1,TYPE_2,TYPE_3,TYPE_4,TYPE_5,TYPE_6,TYPE_7
0,0,0,1,0,0,0,0,1,1,0,...,0,0,1,1,0,0,0,0,0,0
1,1,0,0,0,0,0,1,0,1,0,...,0,1,0,0,0,0,1,0,0,0
2,0,0,1,0,0,0,0,1,1,0,...,0,0,1,1,0,0,0,0,0,0
3,0,0,1,0,0,0,0,1,1,0,...,0,0,1,1,0,0,0,0,0,0
4,0,0,1,0,0,0,0,1,1,0,...,0,0,1,1,0,0,0,0,0,0


### Q5. Find frequent itemsets in zoo dataset having min support 0.5 

In [18]:
#apriori with min support 0.5 and confidence 0.5
associations_zoo = apriori(df_zoo,min_support = 0.5) 
associations_zoo

Unnamed: 0,support,itemsets
0,0.58,(6)
1,0.80,(8)
2,0.59,(11)
3,0.60,(12)
4,0.76,(14)
...,...,...
178,0.50,"(8, 14, 19, 23, 31)"
179,0.51,"(16, 23, 25, 26, 28)"
180,0.56,"(23, 25, 26, 28, 31)"
181,0.51,"(32, 23, 25, 26, 28)"


### Q6. Find frequent association rules having min confidence 0.5

In [19]:
# Find and display rules
rules_zoo = association_rules(associations_zoo, metric="confidence", min_threshold=0.5)
print(rules_zoo)

     antecedents       consequents  antecedent support  consequent support  \
0           (11)               (6)                0.59                0.58   
1            (6)              (11)                0.58                0.59   
2           (12)               (6)                0.60                0.58   
3            (6)              (12)                0.58                0.60   
4           (26)               (6)                0.92                0.58   
...          ...               ...                 ...                 ...   
1227        (32)  (25, 26, 31, 23)                0.87                0.59   
1228        (23)  (32, 25, 26, 31)                0.82                0.50   
1229        (25)  (32, 26, 31, 23)                0.79                0.61   
1230        (26)  (32, 25, 31, 23)                0.92                0.51   
1231        (31)  (32, 25, 26, 23)                0.75                0.55   

      support  confidence      lift  leverage  conviction  
0  

### Q7. Convert the dataset into two classes "Mammal" and "others"

In [20]:
# Take mammal class column as the class column and drop others.
df_zoo.drop(["TYPE_2","TYPE_3","TYPE_4","TYPE_5","TYPE_6","TYPE_7"],axis=1, inplace=True)
df_zoo.head()

Unnamed: 0,LEGS_0,LEGS_2,LEGS_4,LEGS_5,LEGS_6,LEGS_8,HAIR_0,HAIR_1,FEATHERS_0,FEATHERS_1,...,VENOMOUS_1,FINS_0,FINS_1,TAIL_0,TAIL_1,DOMESTIC_0,DOMESTIC_1,CATSIZE_0,CATSIZE_1,TYPE_1
0,0,0,1,0,0,0,0,1,1,0,...,0,1,0,0,1,1,0,0,1,1
1,1,0,0,0,0,0,1,0,1,0,...,0,0,1,0,1,1,0,1,0,0
2,0,0,1,0,0,0,0,1,1,0,...,0,1,0,1,0,1,0,0,1,1
3,0,0,1,0,0,0,0,1,1,0,...,0,1,0,0,1,1,0,0,1,1
4,0,0,1,0,0,0,0,1,1,0,...,0,1,0,0,1,1,0,0,1,1


### Q8. Partition the dataset into training and testing part (70:30)

In [21]:
#partition the data
X = df_zoo
X_train, X_test = train_test_split(X, test_size=0.30, random_state=30)

### Q9. Generate association rules for "mammal" class (training data) with min support 0.4 and confidence as 1

In [22]:
# frequent itemsets 
frequent_itemsets = apriori(X_train, min_support=0.4,use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.585714,(HAIR_0)
1,0.414286,(HAIR_1)
2,0.800000,(FEATHERS_0)
3,0.400000,(EGGS_0)
4,0.600000,(EGGS_1)
...,...,...
564,0.471429,"(VENOMOUS_0, TAIL_1, AQUATIC_0, FINS_0, BACKBO..."
565,0.414286,"(VENOMOUS_0, AQUATIC_0, FINS_0, DOMESTIC_0, BA..."
566,0.400000,"(TAIL_1, AQUATIC_0, FINS_0, DOMESTIC_0, BACKBO..."
567,0.457143,"(VENOMOUS_0, TAIL_1, FINS_0, DOMESTIC_0, BACKB..."


In [23]:
# find frequent rules
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=1)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(HAIR_1),(FEATHERS_0),0.414286,0.800000,0.414286,1.0,1.250000,0.082857,inf
1,(HAIR_1),(BREATHES_1),0.414286,0.757143,0.414286,1.0,1.320755,0.100612,inf
2,(EGGS_0),(FEATHERS_0),0.400000,0.800000,0.400000,1.0,1.250000,0.080000,inf
3,(MILK_1),(FEATHERS_0),0.400000,0.800000,0.400000,1.0,1.250000,0.080000,inf
4,(TOOTHED_1),(FEATHERS_0),0.600000,0.800000,0.600000,1.0,1.250000,0.120000,inf
...,...,...,...,...,...,...,...,...,...
723,"(VENOMOUS_0, TAIL_1, FINS_0, DOMESTIC_0, BREAT...",(BACKBONE_1),0.457143,0.814286,0.457143,1.0,1.228070,0.084898,inf
724,"(DOMESTIC_0, VENOMOUS_0, FINS_0, TAIL_1)","(BACKBONE_1, BREATHES_1)",0.457143,0.657143,0.457143,1.0,1.521739,0.156735,inf
725,"(VENOMOUS_0, FEATHERS_0, TAIL_1, AIRBORNE_0, D...",(BACKBONE_1),0.400000,0.814286,0.400000,1.0,1.228070,0.074286,inf
726,"(VENOMOUS_0, TAIL_1, AIRBORNE_0, DOMESTIC_0, B...",(FEATHERS_0),0.400000,0.800000,0.400000,1.0,1.250000,0.080000,inf


In [24]:
# selecting rules having consequents as class mammal
rules[(rules["consequents"] == {"TYPE_1"})]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
12,(MILK_1),(TYPE_1),0.4,0.4,0.4,1.0,2.5,0.24,inf
51,"(MILK_1, FEATHERS_0)",(TYPE_1),0.4,0.4,0.4,1.0,2.5,0.24,inf
93,"(MILK_1, BACKBONE_1)",(TYPE_1),0.4,0.4,0.4,1.0,2.5,0.24,inf
101,"(MILK_1, BREATHES_1)",(TYPE_1),0.4,0.4,0.4,1.0,2.5,0.24,inf
106,"(MILK_1, VENOMOUS_0)",(TYPE_1),0.4,0.4,0.4,1.0,2.5,0.24,inf
187,"(MILK_1, BACKBONE_1, FEATHERS_0)",(TYPE_1),0.4,0.4,0.4,1.0,2.5,0.24,inf
206,"(MILK_1, FEATHERS_0, BREATHES_1)",(TYPE_1),0.4,0.4,0.4,1.0,2.5,0.24,inf
216,"(MILK_1, FEATHERS_0, VENOMOUS_0)",(TYPE_1),0.4,0.4,0.4,1.0,2.5,0.24,inf
301,"(MILK_1, BACKBONE_1, BREATHES_1)",(TYPE_1),0.4,0.4,0.4,1.0,2.5,0.24,inf
312,"(MILK_1, BACKBONE_1, VENOMOUS_0)",(TYPE_1),0.4,0.4,0.4,1.0,2.5,0.24,inf


### Q10. Test the rules generated on testing dataset and find precision and recall for the rule based classifier

In [27]:
#applying rules on test data
#applying rules on test data
i = 0
X_test['predicted'] = 0
for index, row in X_test.iterrows():
    for rule in rules['antecedents']:
        current = 1
        for col in rule:
            if row[col] == 0: 
                current = 0
                break
        if current == 1:
            X_test.at[index,'predicted'] = 1
            break

In [28]:
# evaluation measures

In [29]:
# print classification report
print(classification_report(X_test,predictions))
print("Confusion Matrix")
print(confusion_matrix(X_test,predictions))
print("\n Accuracy")
accuracy = accuracy_score(X_test, predictions)
print(accuracy)

NameError: name 'Y_test' is not defined

### Q11. Apply decision tree on the dataset and calculate the performance evaluation measures

In [117]:
# Select the independent variables and target column
X = df_zoo[df_zoo.columns[:-1]]
Y = df_zoo['TYPE_1']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.30, random_state = 30)

In [118]:
# Apply decision tree
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train,Y_train)

DecisionTreeClassifier()

In [119]:
# Find predictions by decision tree
predictions = dtree.predict(X_test)

In [120]:
# Evaluation measures and classification report
print(classification_report(Y_test,predictions))
print("Confusion Matrix")
print(confusion_matrix(Y_test,predictions))
print("\n Accuracy")
accuracy = accuracy_score(Y_test, predictions)
print(accuracy)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        18
           1       1.00      1.00      1.00        12

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

Confusion Matrix
[[18  0]
 [ 0 12]]

 Accuracy
1.0


### Q12. Which out of the two classifiers performs better.

In [None]:
# Name of the classifier with accuracy value.