Title - Implementation of Apriori Algorithm.

**Apriori Algorithm: Theory and Implementation**

The Apriori algorithm is a popular technique in data mining and association rule learning. It is used to discover frequent itemsets in a dataset and generate association rules based on their occurrence. Association rule mining aims to find interesting relationships or associations among items in large datasets, often used in market basket analysis, recommender systems, and more.

**Theory:**

Apriori is based on the observation that if an itemset is frequent, then all of its subsets must also be frequent. The algorithm operates in two main steps:

1. **Finding Frequent Itemsets:**
   - Support is a measure of how frequently an itemset appears in the dataset. It is defined as the ratio of the number of transactions containing the itemset to the total number of transactions.
   - The Apriori algorithm scans the dataset to find frequent 1-itemsets (individual items) by counting their occurrences and comparing them to a user-defined minimum support threshold. The algorithm then uses these frequent 1-itemsets to generate candidate itemsets of length 2 and further prunes non-frequent itemsets.
   - This process continues iteratively, generating k+1 itemsets from frequent k-itemsets until no more frequent itemsets can be found.

2. **Generating Association Rules:**
   - Once frequent itemsets are discovered, association rules are generated from them. An association rule is an implication of the form A -> B, where A and B are itemsets.
   - Confidence measures the strength of the rule A -> B, indicating the proportion of transactions that contain both A and B over the total number of transactions that contain A.
   - Lift measures the ratio of the observed support of the rule A -> B to the expected support if A and B were independent. Lift greater than 1 implies a positive correlation between A and B, making the rule more interesting.

**Implementation:**

In the provided Python code, we implement the Apriori algorithm on a dataset stored in "store_data.csv." The dataset contains transaction data, where each row represents a transaction, and each column represents an item purchased in that transaction. We use the `apyori` library to perform the Apriori algorithm.

The code reads the dataset and converts it into a list of lists, where each inner list contains the items in a transaction. We then apply the Apriori algorithm using `apriori()` with specified minimum support, confidence, lift, and minimum length for the association rules.

Finally, the code prints the discovered frequent itemsets and the generated association rules, along with their support, confidence, and lift values.

**Conclusion:**

The Apriori algorithm is a powerful tool for discovering interesting relationships among items in large datasets. By identifying frequent itemsets and generating association rules, Apriori enables businesses to gain valuable insights into customer behavior, product recommendations, and market basket analysis. Its implementation in Python makes it accessible to researchers and practitioners for various data mining applications. Understanding and utilizing the Apriori algorithm can contribute significantly to data-driven decision-making and knowledge extraction from transactional data.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori

In [None]:
store_data = pd.read_csv("store_data.csv", header=None)
display(store_data.head())
print(store_data.shape)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


(7501, 20)


In [None]:
records = []
for i in range(1, 7501):
    records.append([str(store_data.values[i, j]) for j in range(0, 20)])

In [None]:
print(type(records))

<class 'list'>


In [None]:
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)

In [None]:
print("There are {} Relation derived.".format(len(association_results)))

There are 48 Relation derived.


In [None]:
for i in range(0, len(association_results)):
    print(association_results[i][0])

frozenset({'light cream', 'chicken'})
frozenset({'escalope', 'mushroom cream sauce'})
frozenset({'escalope', 'pasta'})
frozenset({'herb & pepper', 'ground beef'})
frozenset({'tomato sauce', 'ground beef'})
frozenset({'olive oil', 'whole wheat pasta'})
frozenset({'shrimp', 'pasta'})
frozenset({'nan', 'light cream', 'chicken'})
frozenset({'shrimp', 'chocolate', 'frozen vegetables'})
frozenset({'cooking oil', 'spaghetti', 'ground beef'})
frozenset({'escalope', 'mushroom cream sauce', 'nan'})
frozenset({'escalope', 'pasta', 'nan'})
frozenset({'spaghetti', 'ground beef', 'frozen vegetables'})
frozenset({'milk', 'olive oil', 'frozen vegetables'})
frozenset({'shrimp', 'mineral water', 'frozen vegetables'})
frozenset({'spaghetti', 'olive oil', 'frozen vegetables'})
frozenset({'shrimp', 'spaghetti', 'frozen vegetables'})
frozenset({'spaghetti', 'frozen vegetables', 'tomatoes'})
frozenset({'spaghetti', 'ground beef', 'grated cheese'})
frozenset({'herb & pepper', 'ground beef', 'mineral water'})


In [None]:
for item in association_results:
    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    # second index of the inner list
    print("Support: " + str(item[1]))

    # third index of the list located at 0th
    # of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: light cream -> chicken
Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
Rule: escalope -> mushroom cream sauce
Support: 0.005733333333333333
Confidence: 0.30069930069930073
Lift: 3.7903273197390845
Rule: escalope -> pasta
Support: 0.005866666666666667
Confidence: 0.37288135593220345
Lift: 4.700185158809287
Rule: herb & pepper -> ground beef
Support: 0.016
Confidence: 0.3234501347708895
Lift: 3.2915549671393096
Rule: tomato sauce -> ground beef
Support: 0.005333333333333333
Confidence: 0.37735849056603776
Lift: 3.840147461662528
Rule: olive oil -> whole wheat pasta
Support: 0.008
Confidence: 0.2714932126696833
Lift: 4.130221288078346
Rule: shrimp -> pasta
Support: 0.005066666666666666
Confidence: 0.3220338983050848
Lift: 4.514493901473151
Rule: nan -> light cream
Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
Rule: shrimp -> chocolate
Support: 0.005333333333333333
Confidence: 0.23255813953488372
Lift: 3.2601