# Association Rule Mining with Apriori Using `apyori

**Association rule mining** is a technique used to discover interesting relationships, patterns, or associations between items in large datasets. One of the most popular algorithms for this task is **Apriori**, which helps in identifying frequent itemsets and generating association rules based on those frequent patterns.

The **apyori** library is a simple and lightweight Python library that implements the **Apriori** algorithm.

**The Apriori Algorithm**

One popular method used for association rule mining is the **Apriori** algorithm. The idea behind the **Apriori** algorithm is simple: it looks for sets of items that appear together frequently and creates rules based on those itemsets.

**What is an "Itemset"?**

An itemset is a collection of items that appear together in transactions. For example, if you’re analyzing grocery store data, an itemset could be a pair of items like {bread, butter}.

**Steps in Apriori Algorithm:**
1. **Identify Frequent Itemsets**: The **Apriori** algorithm first identifies the sets of items that frequently occur together in transactions. The frequency of an itemset is typically measured by **support** (the proportion of transactions that contain the itemset).
  
2. **Generate Association Rules**: After identifying the frequent itemsets, the algorithm generates rules based on those itemsets. Each rule has an **antecedent** (the items on the left-hand side) and a **consequent** (the items on the right-hand side). Each rule is evaluated based on the following metrics:
   - **Support**
   - **Confidence**
   - **Lift**

**Key Metrics in Association Rule Mining**

1. **Support**: Support tells us how frequently an itemset appears in the dataset. For example, if 100 out of 1000 transactions have both bread and butter, the support is 10% (100/1000).

2. **Confidence**: Confidence is the probability that an item Y is bought when item X is bought. For instance, if 50 transactions have both bread and butter, and 40 of them also have jam, then the confidence of the rule {bread -> jam} is 80% (40/50).

3. **Lift**: Lift helps to understand the strength of the association between items. A lift value greater than 1 means that the items are more likely to be bought together than randomly, while a value less than 1 means they are less likely to be bought together.


**Working with the apyori Library**

The apyori library is a simple, easy-to-use Python package that implements the Apriori algorithm. We’ll use it to apply association rule mining to some data.

**Steps to Apply Apriori**





*   **Step 1: Install and Import the Libraries**

          First, we need to install the apyori library and import some other helpful libraries.





In [None]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5954 sha256=6d6ba0d3363da3d12518be06b21a670453115afa1c8684f0be226996714675c9
  Stored in directory: /root/.cache/pip/wheels/77/3d/a6/d317a6fb32be58a602b1e8c6b5d6f31f79322da554cad2a5ea
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2







*   **numpy**: Helps with numerical computations (e.g., handling large datasets).

*  **pandas**: Used for reading and manipulating data (e.g., loading CSV files).
*  **matplotlib**: Helps to create graphs and charts.


*   **apyori**: The library that actually performs the association rule mining





In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori




*   **Step 2: Load the Data**



       Now, let’s load the dataset. Assume we have data from a retail store
       where each transaction shows what items were bought together. We'll load it from a CSV file and take a look at the first few rows


* Assumes there is no header in the CSV
* store_data.head() shows the first few rows of the data to get an idea of what it looks like.


* store_data.shape tells us how many rows (transactions) and columns (items) are in the dataset.


* Print the shape of the dataset (rows and columns)




In [None]:
store_data = pd.read_csv("store_data.csv", header=None)
display(store_data.head())
print(store_data.shape)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


(7501, 20)




*   **Step 3: Preprocess the Data**

*   Before applying the Apriori algorithm,the data needs to be in a
specific format.
       We need to convert the dataset into a list of transactions,
       where each transaction is a list of items purchased.
*   We loop through all the transactions and convert each row into a list of items. This list will be used for finding patterns in the data.




       

In [None]:
records = []
for i in range(1, 7501):
    records.append([str(store_data.values[i, j]) for j in range(0, 20)])


*  Display the type of 'records' to ensure it's a list



In [None]:
print(type(records))

<class 'list'>




*   **Step 4: Apply the Apriori Algorithm**



*   Now we’re ready to apply the Apriori algorithm to discover frequent itemsets and generate rules. We’ll set thresholds for support, confidence, and lift to control how strong and frequent the rules should be.
*  We have provide `min_support`, `min_confidence`, `min_lift`, and `min
length` of sample-set for find rule.

*  min_support=0.0045: This means an itemset must appear in at least 0.45% of the transactions to be considered frequent.

*  min_confidence=0.2: A rule must have at least 20% confidence to be
considered strong.
*   min_lift=3: The lift must be at least 3, meaning that the items must have a stronger association than what would be expected by chance.
*  min_length=2: This means we’re only looking for rules that involve at least two items.

In [None]:
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)



* **Step 5: Interpret the Association Rules**


* Once we apply the Apriori algorithm, we get a list of association rules. Each rule represents a relationship between two or more items.
*  Print the number of association rules derived after applying the Apriori algorithm


     

    



In [None]:
print("There are {} Relation derived.".format(len(association_results)))

There are 48 Relation derived.


  


  **Displaying Generated Association Rules**







*  Loop through the generated association rules and print them



In [None]:
for i in range(0, len(association_results)):
    print(association_results[i][0])

frozenset({'chicken', 'light cream'})
frozenset({'escalope', 'mushroom cream sauce'})
frozenset({'escalope', 'pasta'})
frozenset({'herb & pepper', 'ground beef'})
frozenset({'tomato sauce', 'ground beef'})
frozenset({'olive oil', 'whole wheat pasta'})
frozenset({'shrimp', 'pasta'})
frozenset({'nan', 'chicken', 'light cream'})
frozenset({'shrimp', 'chocolate', 'frozen vegetables'})
frozenset({'ground beef', 'spaghetti', 'cooking oil'})
frozenset({'nan', 'escalope', 'mushroom cream sauce'})
frozenset({'nan', 'escalope', 'pasta'})
frozenset({'frozen vegetables', 'spaghetti', 'ground beef'})
frozenset({'olive oil', 'milk', 'frozen vegetables'})
frozenset({'shrimp', 'mineral water', 'frozen vegetables'})
frozenset({'olive oil', 'frozen vegetables', 'spaghetti'})
frozenset({'shrimp', 'frozen vegetables', 'spaghetti'})
frozenset({'tomatoes', 'frozen vegetables', 'spaghetti'})
frozenset({'spaghetti', 'grated cheese', 'ground beef'})
frozenset({'mineral water', 'herb & pepper', 'ground beef'})


**Iterating Over Association Rules and Displaying Detailed Information**

* Loop through the association results to print detailed information about each rule
* The rule is represented as an antecedent -> consequent


     We loop through each rule and print the details:

    * The base item (left-hand side of the rule) and the added item (right-hand side of the rule).

    * The support of the rule.
    * The confidence of the rule.
    * The lift of the rule.









In [None]:
for item in association_results:
    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    # second index of the inner list
    print("Support: " + str(item[1]))

    # third index of the list located at 0th
    # of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: chicken -> light cream
Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
Rule: escalope -> mushroom cream sauce
Support: 0.005733333333333333
Confidence: 0.30069930069930073
Lift: 3.7903273197390845
Rule: escalope -> pasta
Support: 0.005866666666666667
Confidence: 0.37288135593220345
Lift: 4.700185158809287
Rule: herb & pepper -> ground beef
Support: 0.016
Confidence: 0.3234501347708895
Lift: 3.2915549671393096
Rule: tomato sauce -> ground beef
Support: 0.005333333333333333
Confidence: 0.37735849056603776
Lift: 3.840147461662528
Rule: olive oil -> whole wheat pasta
Support: 0.008
Confidence: 0.2714932126696833
Lift: 4.130221288078346
Rule: shrimp -> pasta
Support: 0.005066666666666666
Confidence: 0.3220338983050848
Lift: 4.514493901473151
Rule: nan -> chicken
Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
Rule: shrimp -> chocolate
Support: 0.005333333333333333
Confidence: 0.23255813953488372
Lift: 3.26016083

**Summary**

In this module, we learned how to use the Apriori algorithm to find frequent itemsets and generate association rules from transaction data. We explored the key metrics (support, confidence, and lift), understood the process step-by-step, and learned how to interpret the generated rules. This technique is widely used in market basket analysis and can help businesses discover insights from their sales data.