## Association Rules Mining

- Association rule mining is a technique to identify underlying relations between different items. Take an example of a Super Market where customers can buy variety of items.
- Organizations can generate more profit if the relationship between the items purchased in different transactions can be identified.
- The process of identifying an associations between products is called association rule mining.

### The Theory Behind Association Rule Mining

There are several algorithms developed over last half a century to identify most commonly bought itemset. However **Apriori Algorithm** has been in the fore front of these algorithms.There are three major components of Apriori algorithm:

- Support
- Confidence
- Lift

### Support

Support refers to the default popularity of an item and can be calculated by finding number of transactions containing a particular item divided by total number of transactions. Suppose we want to find support for item B. This can be calculated as:


\begin{equation*}
Support(B) = (Transactions . containing (B)) / (Total.Transactions)
\end{equation*}

Support(Ketchup) = (Transactions containingKetchup) / (Total Transactions)

Support(Ketchup) = __100/1000__  
                 = __10%__

### Confidence

- Confidence refers to the likelihood that an item B is also bought if item A is bought. 
- It can be calculated by finding the number of transactions where A and B are bought together, divided by total number of transactions where A is bought. Mathematically, it can be represented as:

\begin{equation*}
Confidence(A→B) = (Transactions \ containing \ both \ (A \& B))\ / \ (Transactions \ containing \ A) 
\end{equation*}
<br>
<br>
Confidence(Burger→Ketchup) = (Transactions containing both (Burger and Ketchup))/(Transactions containing A)

Confidence(Burger→Ketchup) = __50/150__ <br>
&emsp;   &emsp; &emsp;     &emsp;    &emsp;  &emsp;   &emsp; &emsp;  &emsp; &emsp; &emsp; &emsp; &emsp;= __33.3%__

### Lift
- Lift(A -> B) refers to the increase in the ratio of sale of B when A is sold. 
- Mathematically it can be represented as:


\begin{equation*}
Lift(A→B) = (Confidence (A→B))/(Support (B))  
\end{equation*}

Lift(Burger→Ketchup) = (Confidence (Burger→Ketchup))/(Support (Ketchup))

__Lift(Burger→Ketchup) = &emsp;33.3/10  = 3.33__

### Interpretaion of Lift -
Lift basically tells us that the likelihood of buying a Burger and Ketchup together is 3.33 times more than the likelihood of just buying the ketchup. A Lift of 1 means there is no association between products A and B.

### Steps Involved in Apriori Algorithm

- Set a minimum value for support and confidence. This means that we are only interested in finding rules for the items that have certain default existence (e.g. support) and have a minimum value for co-occurrence with other items (e.g. confidence).
- Extract all the subsets having higher value of support than minimum threshold.
- Select all the rules from the subsets with confidence value higher than minimum threshold.
- Order the rules by descending order of Lift.

### Lets Implement Apriori Algorithm

### 1.  Load Libraries

In [1]:
! pip install apyori

In [2]:
import numpy as np  
import matplotlib.pyplot as plt  
import pandas as pd  
from apyori import apriori  

### 2. Load Data

In [3]:
store_data = pd.read_csv("data/store_data.csv",header = None)
store_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


### 3. Preprocessing of Data

In [4]:
# Convert pandas data frame into list such that each row of store_data becomes an item in the new list
records = []
for i in range(0, len(store_data)):  
    records.append([str(store_data.values[i,j]) for j in range(0, 20)])

# Each item in the record list will include items bought by a customer
records[0]

['shrimp',
 'almonds',
 'avocado',
 'vegetables mix',
 'green grapes',
 'whole weat flour',
 'yams',
 'cottage cheese',
 'energy drink',
 'tomato juice',
 'low fat yogurt',
 'green tea',
 'honey',
 'salad',
 'mineral water',
 'salmon',
 'antioxydant juice',
 'frozen smoothie',
 'spinach',
 'olive oil']

### 4. Build Apriori Model

In [16]:
association_rules = apriori(records, min_support=0.005, min_confidence=0.3, min_lift=3, min_length=2)  
association_results = list(association_rules)

In [17]:
# Lets review rules being generated by Apriori algorithm
print("Total Rules or commonly observed itemset in store_data basis the support & confidence threshold is %s"
     %len(association_results))

Total Rules or commonly observed itemset in store_data basis the support & confidence threshold is 24


In [18]:
for item in association_results:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: escalope -> mushroom cream sauce
Support: 0.005732568990801226
Confidence: 0.3006993006993007
Lift: 3.790832696715049
Rule: escalope -> pasta
Support: 0.005865884548726837
Confidence: 0.3728813559322034
Lift: 4.700811850163794
Rule: ground beef -> herb & pepper
Support: 0.015997866951073192
Confidence: 0.3234501347708895
Lift: 3.2919938411349285
Rule: ground beef -> tomato sauce
Support: 0.005332622317024397
Confidence: 0.3773584905660377
Lift: 3.840659481324083
Rule: pasta -> shrimp
Support: 0.005065991201173177
Confidence: 0.3220338983050847
Lift: 4.506672147735896
Rule: nan -> escalope
Support: 0.005732568990801226
Confidence: 0.3006993006993007
Lift: 3.790832696715049
Rule: nan -> escalope
Support: 0.005865884548726837
Confidence: 0.3728813559322034
Lift: 4.700811850163794
Rule: ground beef -> spaghetti
Support: 0.008665511265164644
Confidence: 0.31100478468899523
Lift: 3.165328208890303
Rule: shrimp -> mineral water
Support: 0.007199040127982935
Confidence: 0.305084745762711

### Conclusion
Association rule mining algorithms such as Apriori are very useful for finding simple associations between our data items. They are easy to implement and have high explain-ability. 