## Association Rule Mining with Apriori Using `apyori`

**Association rule mining** is a technique used to discover interesting relationships, patterns, or associations between items in large datasets. One of the most popular algorithms for this task is **Apriori**, which helps in identifying frequent itemsets and generating association rules based on those frequent patterns.

The **apyori** library is a simple and lightweight Python library that implements the **Apriori** algorithm. It's commonly used for market basket analysis and can help uncover associations between items in transactional datasets.

### Steps in Apriori Algorithm:
1. **Identify Frequent Itemsets**: The Apriori algorithm first identifies the sets of items that frequently occur together in transactions. The frequency of an itemset is typically measured by **support** (the proportion of transactions that contain the itemset).
  
2. **Generate Association Rules**: After identifying the frequent itemsets, the algorithm generates rules based on those itemsets. Each rule has an **antecedent** (the items on the left-hand side) and a **consequent** (the items on the right-hand side). Each rule is evaluated based on the following metrics:
   - **Support**: The frequency of the itemset in the dataset.
   - **Confidence**: The likelihood that the consequent is purchased when the antecedent is purchased.
   - **Lift**: The strength of the association between the antecedent and consequent, considering their individual frequencies.

# Market Basket Analysis of Store Data

## Dataset Description

* Different products given 7500 transactions over the course of a week at a French retail store.
* We have library(**apyori**) to calculate the association rule using Apriori.

## Import the Library



* Install the 'apyori' library, which is used to implement the Apriori algorithm for association rule mining
* The 'apyori' library helps in finding frequent itemsets and generating association rules from transactional data



In [None]:
!pip install apyori

Defaulting to user installation because normal site-packages is not writeable



* Importing numpy For numerical computations
* Importing pandas For data manipulation and reading CSVs
* Importing Matplotlib.pyplot For plotting data
* Importing apriori algorithm from apyori package



In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori

## Read data and Display

* Assumes there is no header in the CSV
* Display the first 5 rows to check the data
* Print the shape of the dataset (rows and columns)





In [None]:
store_data = pd.read_csv("store_data.csv", header=None)
display(store_data.head())
print(store_data.shape)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


(7501, 20)


## Preprocessing on Data
*  Here we need a data in form of list for Apriori Algorithm.


* Convert the DataFrame into a list of lists where each sublist is a transaction
* Looping through each transaction (assuming 7500 transactions)



In [None]:
records = []
for i in range(1, 7501):
    records.append([str(store_data.values[i, j]) for j in range(0, 20)])


*  Display the type of 'records' to ensure it's a list



In [None]:
print(type(records))

<class 'list'>


## Apriori Algorithm

* Now time to apply algorithm on data.
* We have provide `min_support`, `min_confidence`, `min_lift`, and `min length` of sample-set for find rule.

#### Measure 1: Support.
This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears. In Table 1 below, the support of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For instance, the support of {apple, beer, rice} is 2 out of 8, or 25%.

![](https://annalyzin.files.wordpress.com/2016/04/association-rule-support-table.png?w=503&h=447)

If you discover that sales of items beyond a certain proportion tend to have a significant impact on your profits, you might consider using that proportion as your support threshold. You may then identify itemsets with support values above this threshold as significant itemsets.

#### Measure 2: Confidence.
This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears. In Table 1, the confidence of {apple -> beer} is 3 out of 4, or 75%.

![](https://annalyzin.files.wordpress.com/2016/03/association-rule-confidence-eqn.png?w=527&h=77)

One drawback of the confidence measure is that it might misrepresent the importance of an association. This is because it only accounts for how popular apples are, but not beers. If beers are also very popular in general, there will be a higher chance that a transaction containing apples will also contain beers, thus inflating the confidence measure. To account for the base popularity of both constituent items, we use a third measure called lift.

#### Measure 3: Lift.
This says how likely item Y is purchased when item X is purchased, while controlling for how popular item Y is. In Table 1, the lift of {apple -> beer} is 1,which implies no association between items. A lift value greater than 1 means that item Y is likely to be bought if item X is bought, while a value less than 1 means that item Y is unlikely to be bought if item X is bought.
![](https://annalyzin.files.wordpress.com/2016/03/association-rule-lift-eqn.png?w=566&h=80)



* Apply the Apriori algorithm to the dataset with specific minimum thresholds for support, confidence, and lift

*  Convert the association rules to a list





In [None]:
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)

## How many relation derived



* Print the number of association rules derived after applying the Apriori algorithm



In [None]:
print("There are {} Relation derived.".format(len(association_results)))

There are 48 Relation derived.


### Association Rules Derived



* Loop through the generated association rules and print them




In [None]:
for i in range(0, len(association_results)):
    print(association_results[i][0])

frozenset({'light cream', 'chicken'})
frozenset({'escalope', 'mushroom cream sauce'})
frozenset({'escalope', 'pasta'})
frozenset({'ground beef', 'herb & pepper'})
frozenset({'ground beef', 'tomato sauce'})
frozenset({'olive oil', 'whole wheat pasta'})
frozenset({'pasta', 'shrimp'})
frozenset({'light cream', 'nan', 'chicken'})
frozenset({'frozen vegetables', 'chocolate', 'shrimp'})
frozenset({'ground beef', 'cooking oil', 'spaghetti'})
frozenset({'nan', 'escalope', 'mushroom cream sauce'})
frozenset({'escalope', 'pasta', 'nan'})
frozenset({'ground beef', 'frozen vegetables', 'spaghetti'})
frozenset({'milk', 'frozen vegetables', 'olive oil'})
frozenset({'frozen vegetables', 'mineral water', 'shrimp'})
frozenset({'frozen vegetables', 'spaghetti', 'olive oil'})
frozenset({'frozen vegetables', 'spaghetti', 'shrimp'})
frozenset({'frozen vegetables', 'tomatoes', 'spaghetti'})
frozenset({'ground beef', 'grated cheese', 'spaghetti'})
frozenset({'ground beef', 'mineral water', 'herb & pepper'})


## Rules Generated



* Loop through the association results to print detailed information about each rule
* The rule is represented as an antecedent -> consequent



In [None]:
for item in association_results:
    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    # second index of the inner list
    print("Support: " + str(item[1]))

    # third index of the list located at 0th
    # of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: light cream -> chicken
Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
Rule: escalope -> mushroom cream sauce
Support: 0.005733333333333333
Confidence: 0.30069930069930073
Lift: 3.7903273197390845
Rule: escalope -> pasta
Support: 0.005866666666666667
Confidence: 0.37288135593220345
Lift: 4.700185158809287
Rule: ground beef -> herb & pepper
Support: 0.016
Confidence: 0.3234501347708895
Lift: 3.2915549671393096
Rule: ground beef -> tomato sauce
Support: 0.005333333333333333
Confidence: 0.37735849056603776
Lift: 3.840147461662528
Rule: olive oil -> whole wheat pasta
Support: 0.008
Confidence: 0.2714932126696833
Lift: 4.130221288078346
Rule: pasta -> shrimp
Support: 0.005066666666666666
Confidence: 0.3220338983050848
Lift: 4.514493901473151
Rule: light cream -> nan
Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
Rule: frozen vegetables -> chocolate
Support: 0.005333333333333333
Confidence: 0.23255813953488372
L

References : **Theory** https://www.kdnuggets.com/2016/04/association-rules-apriori-algorithm-tutorial.html