### Overview
- Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.


 For example, the rule  {onions,potatoes}=> {burger} found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat.
 
### Some Basic Concepts:

- [Association rule](https://en.wikipedia.org/wiki/Association_rule_learning)
- [Support](https://en.wikipedia.org/wiki/Association_rule_learning#Useful_Concepts)
- [Confidence](https://en.wikipedia.org/wiki/Association_rule_learning#Useful_Concepts)
- [List](https://en.wikipedia.org/wiki/Association_rule_learning#Useful_Concepts)
- [Conviction](https://en.wikipedia.org/wiki/Association_rule_learning#Useful_Concepts)

- Some useful concepts related to association rule:

![chart](https://www.researchgate.net/profile/Chulhyun_Kim/publication/228827521/figure/tbl1/AS:669547573026817@1536643989215/Measures-of-interestingness.png)

![nice example](https://www.researchgate.net/publication/321053532/figure/tbl1/AS:613924688896000@1523382460012/Support-confidence-and-lift-calculation-for-patient-with-diabetes.png)




- Apriori is an algorithm for frequent item set mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.



#### Simple procedure on how Apriori works:
    
        - Set a minimum support and confidence
        - Take all the subsets in transaction having support higher than minimum support
        - Take all rules of these subsets having confidence higher than minimum confidence
        - Now sort the rules by decreasing lift 
        

()



# Implenentation using libraries:

Libraries Doc Links:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler.fit_transform

http://rasbt.github.io/mlxtend/
    
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html

https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.colors.ListedColormap.html

https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.scatter.html

https://matplotlib.org/3.1.1/api/colors_api.html

In [2]:
# Implementation using mlxtend



import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules 


# Loading the Data 
data = pd.read_csv('https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/07_Visualization/Online_Retail/Online_Retail.csv',encoding='iso-8859-1') 
data.head() 

# Exploring the columns of the data 
data.columns 

# Exploring the different regions of transactions 
data.Country.unique() 

# Stripping extra spaces in the description 
data['Description'] = data['Description'].str.strip() 

# Dropping the rows without any invoice number 
data.dropna(axis = 0, subset =['InvoiceNo'], inplace = True) 
data['InvoiceNo'] = data['InvoiceNo'].astype('str') 

# Dropping all transactions which were done on credit 
data = data[~data['InvoiceNo'].str.contains('C')] 

# Transactions done in France 
basket_France = (data[data['Country'] =="France"].groupby(['InvoiceNo', 'Description'])['Quantity'] .sum().unstack().reset_index().fillna(0) .set_index('InvoiceNo')) 



# Defining the hot encoding function to make the data suitable 
# for the concerned libraries 
def hot_encode(x): 
    if(x<= 0): 
        return 0
    if(x>= 1): 
        return 1

# Encoding the datasets 
basket_encoded = basket_France.applymap(hot_encode) 
basket_France = basket_encoded 


# Building the model 
frq_items = apriori(basket_France, min_support = 0.05, use_colnames = True) 

# Collecting the inferred rules in a dataframe 
rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 
print(rules.head()) 




                                           antecedents  \
45                        (JUMBO BAG WOODLAND ANIMALS)   
260  (PLASTERS IN TIN CIRCUS PARADE, RED TOADSTOOL ...   
272  (RED TOADSTOOL LED NIGHT LIGHT, PLASTERS IN TI...   
302  (SET/20 RED RETROSPOT PAPER NAPKINS, SET/6 RED...   
300  (SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET...   

                         consequents  antecedent support  consequent support  \
45                         (POSTAGE)            0.076531            0.765306   
260                        (POSTAGE)            0.051020            0.765306   
272                        (POSTAGE)            0.053571            0.765306   
302  (SET/6 RED SPOTTY PAPER PLATES)            0.102041            0.127551   
300    (SET/6 RED SPOTTY PAPER CUPS)            0.102041            0.137755   

      support  confidence      lift  leverage  conviction  
45   0.076531       1.000  1.306667  0.017961         inf  
260  0.051020       1.000  1.306667  0.011974     

### Implementation using apyori

In [8]:


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori


# Loading the Data 
data = pd.read_csv('https://raw.githubusercontent.com/luoyetx/Apriori/master/data.csv') 
data.head() 


Unnamed: 0,corned_b,peppers,bourbon,cracker,chicken,apples,coke
0,olives,bourbon,coke,turkey,ice_crea,ham,baguette
1,hering,corned_b,olives,ham,turkey,bourbon,peppers
2,baguette,sardines,apples,peppers,avocado,ice_crea,bourbon
3,baguette,soda,hering,cracker,heineken,peppers,apples
4,baguette,soda,hering,cracker,heineken,corned_b,ham


In [9]:
data.shape

(1000, 7)

In [10]:
#Converting pandas dataframe into list of lists
records=data.values.tolist()
records

[['olives', 'bourbon', 'coke', 'turkey', 'ice_crea', 'ham', 'baguette'],
 ['hering', 'corned_b', 'olives', 'ham', 'turkey', 'bourbon', 'peppers'],
 ['baguette',
  'sardines',
  'apples',
  'peppers',
  'avocado',
  'ice_crea',
  'bourbon'],
 ['baguette', 'soda', 'hering', 'cracker', 'heineken', 'peppers', 'apples'],
 ['baguette', 'soda', 'hering', 'cracker', 'heineken', 'corned_b', 'ham'],
 ['avocado', 'cracker', 'artichok', 'heineken', 'ham', 'ice_crea', 'olives'],
 ['hering', 'corned_b', 'apples', 'olives', 'steak', 'cracker', 'chicken'],
 ['corned_b', 'peppers', 'bourbon', 'cracker', 'chicken', 'avocado', 'soda'],
 ['baguette',
  'sardines',
  'apples',
  'peppers',
  'avocado',
  'ice_crea',
  'ice_crea'],
 ['soda', 'olives', 'bourbon', 'cracker', 'heineken', 'steak', 'corned_b'],
 ['soda', 'olives', 'bourbon', 'cracker', 'heineken', 'steak', 'steak'],
 ['soda', 'olives', 'bourbon', 'cracker', 'heineken', 'ham', 'hering'],
 ['corned_b',
  'peppers',
  'bourbon',
  'cracker',
  'chi

In [28]:
# Building apriori model
association_rules=apriori(records,min_support=0.05,min_confidence=0.7)
results=list(association_rules)

for i in range(5):
    print(results[i])
    print("\n")

RelationRecord(items=frozenset({'heineken', 'artichok'}), support=0.252, ordered_statistics=[OrderedStatistic(items_base=frozenset({'artichok'}), items_add=frozenset({'heineken'}), confidence=0.8262295081967214, lift=1.377049180327869)])


RelationRecord(items=frozenset({'ice_crea', 'coke'}), support=0.22, ordered_statistics=[OrderedStatistic(items_base=frozenset({'coke'}), items_add=frozenset({'ice_crea'}), confidence=0.7457627118644068, lift=2.3826284724102456), OrderedStatistic(items_base=frozenset({'ice_crea'}), items_add=frozenset({'coke'}), confidence=0.7028753993610224, lift=2.3826284724102456)])


RelationRecord(items=frozenset({'heineken', 'cracker'}), support=0.366, ordered_statistics=[OrderedStatistic(items_base=frozenset({'cracker'}), items_add=frozenset({'heineken'}), confidence=0.7515400410677618, lift=1.2525667351129364)])


RelationRecord(items=frozenset({'soda', 'cracker'}), support=0.251, ordered_statistics=[OrderedStatistic(items_base=frozenset({'soda'}), items_add=f

In [29]:
for item in results:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: heineken -> artichok
Support: 0.252
Confidence: 0.8262295081967214
Lift: 1.377049180327869
Rule: ice_crea -> coke
Support: 0.22
Confidence: 0.7457627118644068
Lift: 2.3826284724102456
Rule: heineken -> cracker
Support: 0.366
Confidence: 0.7515400410677618
Lift: 1.2525667351129364
Rule: soda -> cracker
Support: 0.251
Confidence: 0.7893081761006289
Lift: 1.6207560084201824
Rule: heineken -> soda
Support: 0.257
Confidence: 0.8081761006289309
Lift: 1.3469601677148848
Rule: olives -> turkey
Support: 0.221
Confidence: 0.7809187279151945
Lift: 1.6509909681082335
Rule: apples -> avocado
Support: 0.112
Confidence: 0.8057553956834532
Lift: 2.055498458376156
Rule: apples -> avocado
Support: 0.096
Confidence: 0.7441860465116279
Lift: 2.0500993016849254
Rule: apples -> avocado
Support: 0.092
Confidence: 0.7301587301587301
Lift: 2.0114565569111025
Rule: apples -> baguette
Support: 0.097
Confidence: 0.751937984496124
Lift: 1.9182091441227653
Rule: apples -> sardines
Support: 0.095
Confidence: 0

Rule: apples -> steak
Support: 0.097
Confidence: 0.8220338983050848
Lift: 3.355240401245244
Rule: apples -> steak
Support: 0.097
Confidence: 0.8220338983050848
Lift: 3.4684974612028894
Rule: apples -> olives
Support: 0.097
Confidence: 0.8220338983050848
Lift: 3.2110699152542375
Rule: heineken -> avocado
Support: 0.104
Confidence: 0.7703703703703703
Lift: 3.093856909117953
Rule: avocado -> baguette
Support: 0.099
Confidence: 0.7333333333333333
Lift: 4.313725490196078
Rule: avocado -> ham
Support: 0.099
Confidence: 0.7734375
Lift: 4.745015337423313
Rule: heineken -> avocado
Support: 0.112
Confidence: 0.713375796178344
Lift: 2.864963036860819
Rule: heineken -> avocado
Support: 0.111
Confidence: 0.8671875
Lift: 3.4826807228915664
Rule: heineken -> avocado
Support: 0.113
Confidence: 0.7197452229299364
Lift: 2.890543063975648
Rule: heineken -> baguette
Support: 0.113
Confidence: 0.837037037037037
Lift: 2.906378600823045
Rule: heineken -> ham
Support: 0.1
Confidence: 0.78125
Lift: 2.134562841