<a href="https://colab.research.google.com/github/nnrtgt/dma1/blob/main/199DataMiningAssignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

An organization wanted to mine association rules of frequently bought items from its stores and suggest some recommendations to its customers.
 
As a data scientist, you are required to recognize patterns from the available data and evaluate efficacy of methods to obtain patterns. Your activities should include - performing various activities pertaining to the data such as, preparing the dataset for analysis; investigating the relationships in the data set with visualization; identify frequent patterns; formulate association rules and evaluate quality of rules.
 
Demonstrate KDD process with following activities:
*   Problem statement
*   Perform exploratory data analysis
*   Preprocess the data. Identify relevant & irrelevant attributes for the problem.
*   Propose parameters such as support, confidence etc.  
*   Discover frequent patterns
*   Iterate previous steps by varying parameters
*   Formulate association rules
*   Compare association rules
*   Briefly explain importance of discovered rules
 
Following are some points for you to take note of, while doing the assignment:
*   The data in some of the rows in the data set may be noisy, sparse.
*   State all your assumptions clearly
*   Provide clear explanations to explain your stand

# Submission Guidelines

* You have to use python for implementation. 
* You are free to use ML/Python Libraries as well while coding for assignment.
* You need to upload your submission ( Solution, Source code, Dataset etc.) as a zip file along with enough documentation for each step performed and giving the reason for the same. 

* It is must to submit a supporting word document or PPT along with the solution explaining the problem statement, process, structure, approach, analysis, your decisions based on data and comments on accuracy/findings, etc for your allotted assignment project. 
* The assignment file to be uploaded of the type .zip only i.e. Windows Zip format only. Other formats will be not evaluated and rejected for evaluation.
* The name of the assignment solution file must be the Group ID. For Example. DM_GROUP001.zip

## Problem Statement:

We are given about 7501 transactions of various customers, each of these transactions includes multiple items.  The ask is to generate a recommendation engine, so that customer can be presented what other items other customers buy frequently together.  

Approach:

This is the market basket analysis problem we studied during the course and learnt about the association rules, apriori, frequent patterns growth algorithms.  

* Recognize patterns from the given data
* Evaluate efficacy of methods to obtain patterns

## Perform exploratory data analysis

Exploratory Data Analysis refers to the process of performing initial investigations on data so as to discover patterns, to spot anomalies if any, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.


Observations:

df1 = pd.read_excel("/content/dma1/Dataset.xlsx", header=None, keep_default_na=False)

* Reading the Excel file (dataset) into pandas dataframe, there is lot of sparse data (blank data), there is no header row and do not want to convert blank to NaN, so specified keep_default_na to False

df.head()
* Here we are looking first 5 records and understood that there are no column names and each row/transction may not similiar set of items.  One Row with only one Item.

df.info()
* This function is used to get a brief summary of the dataframe. 
* This method prints information about a DataFrame including the index dtype and column dtypes, non-null values, and memory usage.

df1.shape 
* It shows the number of dimensions as well as the size in each dimension. Since data frames are two-dimensional, what shape returns is the number of rows and columns.

df1.size 
* Return an int representing the number of elements in this object. Return the number of rows if Series, otherwise returns the number of rows times the number of columns if DataFrame.

df1.ndim 
* Returns dimension of dataframe/series. 1 for one dimension (series), 2 for two dimensions (dataframe).

df1.describe() 
* Return a statistical summary for numerical columns present in the dataset. 
* This method calculates some statistical measures like percentile, mean and standard deviation of the numerical values of the Series or DataFrame.

df1.sample() 
* Used to generate a sample randomly either row or column. 
* It allows you to select values randomly from a Series or DataFrame. It is useful when we want to select a random sample from a distribution.


df1.isnull().sum()
* Return the number of missing values in each column.

df1.nunique()
* Return number of unique elements in the object. It counts the number of unique entries over columns or rows. It is very useful in categorical features especially in cases where we do not know the number of categories beforehand.

df = df1.copy()
df.dropna()
* This function is used to remove a row or a column from a dataframe that has a NaN or missing values in it.

df1.index
* This function searches for a given element from the start of the list and returns the lowest index where the element appears.

df1.columns 
* Return the column labels of the dataframe.

df1.memory_usage()
* Returns how much memory each column uses in bytes. It is useful especially when we work with large data frames.


## Libraries used and their function
* Pandas
* Mlxtend



In [2]:
# Exploratory Data Analysis
# Total 7500 Rows are present in the given Excel file


!git clone https://github.com/nnrtgt/dma1

Cloning into 'dma1'...
remote: Enumerating objects: 15, done.[K
remote: Counting objects: 100% (15/15), done.[K
remote: Compressing objects: 100% (14/14), done.[K
remote: Total 15 (delta 3), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (15/15), done.


In [3]:
!ls

dma1  sample_data


In [4]:
import pandas as pd

In [5]:
!ls dma1


199DataMiningAssignment.ipynb  Dataset.xlsx


In [6]:
df1 = pd.read_excel("/content/dma1/Dataset.xlsx", header=None, keep_default_na=False)

#df2 = pd.read_excel("https://github.com/nnrtgt/dma1/blob/main/Dataset.xlsx", header=None)

In [7]:
df1.head()
# Here we are looking first 5 records and understood that there are no column names and each row/transction may not similiar set of items.  One Row with only one Item.

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [8]:
df1.info
# This function is used to get a brief summary of the dataframe. 
# This method prints information about a DataFrame including the index dtype and column dtypes, non-null values, and memory usage.

<bound method DataFrame.info of                  0                  1            2                 3   \
0            shrimp            almonds      avocado    vegetables mix   
1           burgers          meatballs         eggs                     
2           chutney                                                     
3            turkey            avocado                                  
4     mineral water               milk   energy bar  whole wheat rice   
...             ...                ...          ...               ...   
7496         butter         light mayo  fresh bread                     
7497        burgers  frozen vegetables         eggs      french fries   
7498        chicken                                                     
7499       escalope          green tea                                  
7500           eggs    frozen smoothie  yogurt cake    low fat yogurt   

                4                 5     6               7             8   \
0     green gra

In [9]:
df1.shape #: It shows the number of dimensions as well as the size in each dimension. Since data frames are two-dimensional, what shape returns is the number of rows and columns.

(7501, 20)

In [10]:
df1.size #: Return an int representing the number of elements in this object. Return the number of rows if Series, otherwise returns the number of rows times the number of columns if DataFrame.

150020

In [11]:
df1.ndim #: Returns dimension of dataframe/series. 1 for one dimension (series), 2 for two dimensions (dataframe).

2

In [12]:
df1.describe() 
#: Return a statistical summary for numerical columns present in the dataset. 
#This method calculates some statistical measures like percentile, mean and standard deviation of the numerical values of the Series or DataFrame.

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
count,7501,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0
unique,115,118.0,116.0,115.0,111.0,107.0,103.0,98.0,89.0,81.0,67.0,51.0,44.0,29.0,20.0,9.0,4.0,4.0,4.0,2.0
top,mineral water,,,,,,,,,,,,,,,,,,,
freq,577,1754.0,3112.0,4156.0,4972.0,5637.0,6132.0,6520.0,6847.0,7106.0,7245.0,7347.0,7414.0,7454.0,7476.0,7493.0,7497.0,7497.0,7498.0,7500.0


In [13]:
df1.sample() 
#: Used to generate a sample randomly either row or column. 
#It allows you to select values randomly from a Series or DataFrame. It is useful when we want to select a random sample from a distribution.

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
5858,fresh tuna,ground beef,pepper,mineral water,pancakes,eggs,whole wheat rice,cake,low fat yogurt,,,,,,,,,,,


In [14]:
df1.isnull().sum()
#: Return the number of missing values in each column.

0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
15    0
16    0
17    0
18    0
19    0
dtype: int64

In [15]:
df1.nunique()
#: Return number of unique elements in the object. It counts the number of unique entries over columns or rows. It is very useful in categorical features especially in cases where we do not know the number of categories beforehand.

0     115
1     118
2     116
3     115
4     111
5     107
6     103
7      98
8      89
9      81
10     67
11     51
12     44
13     29
14     20
15      9
16      4
17      4
18      4
19      2
dtype: int64

In [16]:
df1.index
#: This function searches for a given element from the start of the list and returns the lowest index where the element appears.

RangeIndex(start=0, stop=7501, step=1)

In [17]:
df1.columns 
#: Return the column labels of the dataframe.

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
            19],
           dtype='int64')

In [18]:
df1.memory_usage()
#: Returns how much memory each column uses in bytes. It is useful especially when we work with large data frames.

Index      128
0        60008
1        60008
2        60008
3        60008
4        60008
5        60008
6        60008
7        60008
8        60008
9        60008
10       60008
11       60008
12       60008
13       60008
14       60008
15       60008
16       60008
17       60008
18       60008
19       60008
dtype: int64

In [19]:
df = df1.copy()
df.dropna()
#: This function is used to remove a row or a column from a dataframe that has a NaN or missing values in it.

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,butter,light mayo,fresh bread,,,,,,,,,,,,,,,,,
7497,burgers,frozen vegetables,eggs,french fries,magazines,green tea,,,,,,,,,,,,,,
7498,chicken,,,,,,,,,,,,,,,,,,,
7499,escalope,green tea,,,,,,,,,,,,,,,,,,


In [20]:
!pip install mlxtend

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## Data PreProcessing 
* Identify relevant & irrelevant attributes for the problem.

Here we are given an excel file (dataset of market basket data) in the form of list of transactions, each transaction having one or more items. Each item is represented by a string (item name).

In order to perform the association rules the data has to be represented in binary format as shown in below.  

| TID | Bread | Milk | Egg |  Diapers | 
| --- | --- | --- | --- | --- |
| 1 | 1 | 0 | 1 | 0 | 1 |
| 2 | 1 | 1 | 1 | 0 | 1 |
| 3 | 0 | 1 | 0 | 0 | 1 |

Each row corresponds to a transaction and each column correspnds to an item.  An item can be treated as a binary variable whose value is one if the item is present in a transaction and zero otherwise.  Because the present of an item in a transaction is often considered more important than its absense, an item is an asymmetric binary variable.  This represntation is a simplistic  view of real market basket data because it ignores important aspects of the data such as quantity of items sold or the price paid to purchase them.  

This is also known as one hot encoding technique.  We used **mlxtend.preprocessing** library, which has **TransactionEncoder** to perform this activity

* Converting given data into 1-hot-encoding (TransctionEncoder), there will be 20 columns and the presence becomes 1 otherwise zero


In [21]:
# 1 Hot Encoding or TransactionEncoder - http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/
# http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
dataset=df1.values.tolist()
te_ary = te.fit(dataset).transform(dataset)
# te_ary.astype("int"). -- not needed
df = pd.DataFrame(te_ary, columns=te.columns_)

# now all the blanks became as column 1, that column can be removed
df = df.drop('', axis=1)
df
# te.columns_


Unnamed: 0,almonds,antioxydant juice,asparagus,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,body spray,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,True,True,False,True,False,False,False,False,False,False,...,False,True,False,False,True,False,False,True,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,True,False,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7497,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7498,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7499,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [22]:
!pip install mlxtend --upgrade

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting mlxtend
  Downloading mlxtend-0.20.0-py2.py3-none-any.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 6.2 MB/s 
Installing collected packages: mlxtend
  Attempting uninstall: mlxtend
    Found existing installation: mlxtend 0.14.0
    Uninstalling mlxtend-0.14.0:
      Successfully uninstalled mlxtend-0.14.0
Successfully installed mlxtend-0.20.0


## Propose parameters such as support, confidence etc.

Itemset and Support Count:

Let I = {i1, i2,...,id} be the set of all items in a market basket data and T = {t1, t2, ...,tn} be the set of all trasnactions.  Each transaction ti contains a subset of itemds chosen from I.  In association analysis, a collection of zero or more items is termed an *itemset*.  If an itemset contains k items, it is called k-itemset.  The null or empoty set is an itemset that does not contain any items.  

A transaction tj is said to contain an itemset  if X is a subset of tj.  An important property of an itemset is its support count, which refers to the number of transactions that contain a particular itemset.  In other words, the number of transactions that contain the given itemset, is the support count for that itemset.  

Support is the fraction of transactions in which an itemset occurs.

An itemset X is called frequent if s(X) is greater than some user defined threshold, minsup (minimum support)

An association rule is an implication expression of the form X --> Y, where X and Y are disjoint itemsets.  

The strength of an association rule can be measured in terms of its support and confidence.  Support determines how often a rule is applicable to a given data set, while confidence determines how frequently items in Y appear in transactions that contain X.  

Support is an important measure because a rule that has very low support might occur simply by chance.

Confidence, measures the reliability of the inference made by a rule.  Confidence also provides an estimate of the conditional probability of Y given X.

## Discover frequent patterns

* Due to the advantaes of FP Growth over Apriori algorithm, we can leverage the same here
* Avoids multiple DB scans, instead scans only twice
* Uses a Tree structure to store all the information

Here we used the FP-Growth algorithm to discover the frequent itemsets. The algorithm leverages a compact data structure called an FP-tree and extracts frequent items directly from this structure.  

### FP Tree Representation:

An FP-tree is a compressed representation of the input data.  It is constructed by reading the data set one transaction at a time and mapping each transaction onto a path in the FP-tree.  As different transactions can have several items onto a path in the FP-tree.  

Once a FP Tree is constructed, we generate the Frequent Itemset using the FP Tree.

We have used the mlxtend library to accomplish this and most of this is abstracted 

In [53]:
from mlxtend.frequent_patterns import fpgrowth

fpgrowth(df, min_support=0.15)

# There are more itemsets with the minimum support of 15%

Unnamed: 0,support,itemsets
0,0.238368,(71)
1,0.179709,(36)
2,0.170911,(42)
3,0.17411,(99)
4,0.163845,(24)


## Iterate previous steps by varying parameters

We have tried with different combinations of confidence, support and lift.

There are more itemsets with support about 15%, so we had to try between 3% to 23% for the support metric.  
Minimum Support metric is 0.03 or 3%

In [56]:
fpgrowth(df, min_support=0.23)
# There is only 71 itemsets with the minimum support of 23%, it has only one item of "mineral water"

Unnamed: 0,support,itemsets
0,0.238368,(71)


In [57]:
fpgrowth(df, min_support=0.23, use_colnames=True)

Unnamed: 0,support,itemsets
0,0.238368,(mineral water)


In [69]:
#http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/
frequent_itemsets = fpgrowth(df, min_support=0.05, use_colnames=True)

In [72]:
from mlxtend.frequent_patterns import association_rules

association_rules(frequent_itemsets, metric="confidence", min_threshold=0.22) 

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(eggs),(mineral water),0.179709,0.238368,0.050927,0.283383,1.188845,0.00809,1.062815
1,(spaghetti),(mineral water),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314
2,(mineral water),(spaghetti),0.238368,0.17411,0.059725,0.250559,1.439085,0.018223,1.102008
3,(chocolate),(mineral water),0.163845,0.238368,0.05266,0.3214,1.348332,0.013604,1.122357
4,(mineral water),(chocolate),0.238368,0.163845,0.05266,0.220917,1.348332,0.013604,1.073256


##Association Rule

{eggs} -> {mineral water}

* Antecedent
* Consequent
* Both Antecedent and Consequent can have multiple items
* Support - High Support, Low Support
* Confidence - Reliability of the rule, depends on the domain, for product recommendation even 0.5 (50%) confidence is acceptable, in a medical situation, this level is not high enough
* Lift - ratio of the observed support to that expected if the two rules were completely independent
* The basic rule for Lift > 1

## Formulate association rules

The generated association rules are:

{eggs} --> {mineral water}, {spaghetti} --> {mineral water}, {mineral water} --> {spaghetti}, {chocolate} --> {mineral water}, {mineral water} --> {chocolate}

## Compare association rules

All these association rules are having only item, from the given dataset, we only could find one item set which 
## Briefly explain importance of discovered rules

These discovered association rules indicate that mineral water is a common item people buy whenever they buy eggs, spaghetti, chocoloate. Whenever customer purchases any of the items, it is with high confidence and support, we can recommend to offer "mineral water", as that is the most frequently bought item.  

In [26]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(mineral water),(milk),0.238368,0.129583,0.047994,0.201342,1.553774,0.017105,1.08985
1,(milk),(mineral water),0.129583,0.238368,0.047994,0.37037,1.553774,0.017105,1.20965
2,(spaghetti),(milk),0.17411,0.129583,0.035462,0.203675,1.571779,0.0129,1.093043
3,(milk),(spaghetti),0.129583,0.17411,0.035462,0.273663,1.571779,0.0129,1.137061
4,(chocolate),(milk),0.163845,0.129583,0.032129,0.196094,1.513276,0.010898,1.082736
5,(milk),(chocolate),0.129583,0.163845,0.032129,0.247942,1.513276,0.010898,1.111823
6,(milk),(eggs),0.129583,0.179709,0.030796,0.237654,1.322437,0.007509,1.076009
7,(eggs),(milk),0.179709,0.129583,0.030796,0.171365,1.322437,0.007509,1.050423
8,(spaghetti),(mineral water),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314
9,(mineral water),(spaghetti),0.238368,0.17411,0.059725,0.250559,1.439085,0.018223,1.102008


In [27]:
rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
0,(mineral water),(milk),0.238368,0.129583,0.047994,0.201342,1.553774,0.017105,1.08985,1
1,(milk),(mineral water),0.129583,0.238368,0.047994,0.37037,1.553774,0.017105,1.20965,1
2,(spaghetti),(milk),0.17411,0.129583,0.035462,0.203675,1.571779,0.0129,1.093043,1
3,(milk),(spaghetti),0.129583,0.17411,0.035462,0.273663,1.571779,0.0129,1.137061,1
4,(chocolate),(milk),0.163845,0.129583,0.032129,0.196094,1.513276,0.010898,1.082736,1
5,(milk),(chocolate),0.129583,0.163845,0.032129,0.247942,1.513276,0.010898,1.111823,1
6,(milk),(eggs),0.129583,0.179709,0.030796,0.237654,1.322437,0.007509,1.076009,1
7,(eggs),(milk),0.179709,0.129583,0.030796,0.171365,1.322437,0.007509,1.050423,1
8,(spaghetti),(mineral water),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314,1
9,(mineral water),(spaghetti),0.238368,0.17411,0.059725,0.250559,1.439085,0.018223,1.102008,1


In [35]:
rules[ (rules['antecedent_len'] >= 1) &
       (rules['confidence'] > 0.30) &
       (rules['lift'] > 1.2) ]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
1,(milk),(mineral water),0.129583,0.238368,0.047994,0.37037,1.553774,0.017105,1.20965,1
8,(spaghetti),(mineral water),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314,1
10,(frozen vegetables),(mineral water),0.095321,0.238368,0.035729,0.374825,1.572463,0.013007,1.21827,1
12,(chocolate),(mineral water),0.163845,0.238368,0.05266,0.3214,1.348332,0.013604,1.122357,1
18,(pancakes),(mineral water),0.095054,0.238368,0.033729,0.354839,1.488616,0.011071,1.180529,1
20,(ground beef),(mineral water),0.098254,0.238368,0.040928,0.416554,1.747522,0.017507,1.305401,1
22,(ground beef),(spaghetti),0.098254,0.17411,0.039195,0.398915,2.291162,0.022088,1.373997,1


In [38]:
rules[rules['antecedents'] == {'milk'}]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
1,(milk),(mineral water),0.129583,0.238368,0.047994,0.37037,1.553774,0.017105,1.20965,1
3,(milk),(spaghetti),0.129583,0.17411,0.035462,0.273663,1.571779,0.0129,1.137061,1
5,(milk),(chocolate),0.129583,0.163845,0.032129,0.247942,1.513276,0.010898,1.111823,1
6,(milk),(eggs),0.129583,0.179709,0.030796,0.237654,1.322437,0.007509,1.076009,1


In [43]:
#Take the input of items that the user would like buy
print ("enter the item you wanted to buy, I shall provide the recommendation")
string = str(input())

print ("Items frequently bought together along with "+ string)
rules[rules['antecedents']== {string}]

enter the item you wanted to buy, I shall provide the recommendation
spaghetti
Items frequently bought together along with spaghetti


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
2,(spaghetti),(milk),0.17411,0.129583,0.035462,0.203675,1.571779,0.0129,1.093043,1
8,(spaghetti),(mineral water),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314,1
17,(spaghetti),(chocolate),0.17411,0.163845,0.039195,0.225115,1.373952,0.010668,1.07907,1
23,(spaghetti),(ground beef),0.17411,0.098254,0.039195,0.225115,2.291162,0.022088,1.163716,1
