# LAB 3 : Apriori Algorithm

### Definition:

The Apriori algorithm is a popular algorithm used for frequent itemset mining and association rule learning in transactional databases. It identifies frequent individual items in the database and extends them to larger itemsets, generating rules to discover relationships between items.

### Frequent Itemset:

A frequent itemset refers to a set of items that appear together frequently in a database. The support of an itemset is the proportion of transactions in the database in which the itemset appears. A frequent itemset is one whose support is above a predefined threshold.

### Association Rule:

An association rule is a relationship between two sets of items in a transactional database. It consists of an antecedent (the items on the left-hand side of the rule) and a consequent (the items on the right-hand side of the rule). The strength of an association rule is measured by metrics such as support, confidence, and lift.

### Support:

Support is a measure of how frequently an itemset appears in the database. It is calculated as the proportion of transactions containing the itemset.

$$ \text{Support}(X) = \frac{\text{Transactions containing } X}{\text{Total transactions}} \$$

### Confidence:

Confidence is a measure of the reliability of an association rule. It indicates the likelihood that the consequent will occur given that the antecedent has occurred.

$$ \text{Confidence}(X \Rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X)} \$$

### Lift:

Lift measures the ratio of the observed support to the expected support of the consequent given the antecedent. It indicates whether the presence of the antecedent has any effect on the likelihood of the consequent.

$$[ \text{Lift}(X \Rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X) \times \text{Support}(Y)} \$$

### Algorithm:

1. **Generate Candidate Itemsets**: Start by identifying frequent individual items in the database (frequent 1-itemsets). Then, iteratively generate larger itemsets by joining pairs of frequent (k-1)-itemsets to form candidate k-itemsets.

2. **Prune Candidate Itemsets**: Eliminate candidate itemsets that contain subsets that are infrequent, ensuring that only potentially frequent itemsets are retained.

3. **Calculate Support**: Scan the database to count the occurrences of each candidate itemset and calculate their support.

4. **Generate Association Rules**: For each frequent itemset, generate association rules by partitioning the itemset into antecedent-consequent pairs and calculate their confidence.

5. **Select High Confidence Rules**: Filter the association rules based on a minimum confidence threshold to retain only strong rules.

### Important Concepts:

- **Minimum Support Threshold**: The minimum threshold used to determine whether an itemset is considered frequent.

- **Candidate Generation**: The process of generating larger itemsets from smaller ones by joining pairs of frequent itemsets.

- **Pruning**: The process of eliminating candidate itemsets that contain subsets that are not frequent, reducing the search space.

- **Association Rule Evaluation Metrics**: Metrics such as support, confidence, and lift are used to evaluate the strength of association rules and identify meaningful relationships between items.

- **Apriori Property**: The Apriori property states that if an itemset is infrequent, all its supersets are also infrequent. This property is used for efficient candidate generation and pruning.


# Import Necessary Libraries:

    pandas for data manipulation
    apyori for the Apriori algorithm

In [1]:
import pandas as pd
from apyori import apriori

# 1. Load the Dataset:

Load the dataset from the CSV file.

In [2]:
data = pd.read_csv('my_data.csv')

# 2. Preprocess the Data:

Split the items in each transaction into a list.

In [3]:
trx = [row['Items'].split(', ') for _, row in data.iterrows()]

# 3: Ask User for Minimum Support, Confidence, and Lift

In [4]:
min_support = float(input("Enter the minimum support (between 0 and 1): "))
min_confidence = float(input("Enter the minimum confidence (between 0 and 1): "))
min_lift = float(input("Enter the minimum lift: "))


Enter the minimum support (between 0 and 1): 0.2
Enter the minimum confidence (between 0 and 1): 0.7
Enter the minimum lift: 1


# 4.Apply Apriori Algorithm:

Use the Apriori algorithm to find frequent itemsets with specified minimum support.

In [5]:
# association_rules = apriori(trx, min_support=0.2, min_confidence=0.7, min_lift=1)
association_rules = apriori(trx, min_support=min_support, min_confidence=min_confidence, min_lift=min_lift)

# 5.Extract Association Rules:

Extract association rules from the frequent itemsets with specified minimum confidence and lift.

In [6]:
association_results = list(association_rules)

# 6.Display Results:

Create a DataFrame to display the association rules along with their support and confidence.

In [7]:
pd.set_option('max_colwidth', 1000)
Result = pd.DataFrame(columns=['Rule', 'Support', 'Confidence'])
for item in association_results:
    pair = item[2]
    for i in pair:
        items = str([x for x in i[0]])
        if i[3] != 1:
            Result = pd.concat([Result, pd.DataFrame({
                'Rule': [str([x for x in i[0]]) + " -> " + str([x for x in i[1]])],
                'Support': [str(round(item[1]*100, 2)) + '%'],
                'Confidence': [str(round(i[2] * 100, 2)) + '%']
            })], ignore_index=True)

Result

Unnamed: 0,Rule,Support,Confidence
0,['Beer'] -> ['Chips'],30.0%,75.0%
1,['Chips'] -> ['Beer'],30.0%,100.0%
2,['Soda'] -> ['Beer'],20.0%,100.0%
3,['Eggs'] -> ['Bread'],20.0%,100.0%
4,['Bread'] -> ['Milk'],30.0%,75.0%
5,['Milk'] -> ['Bread'],30.0%,75.0%
6,['Detergent'] -> ['Diapers'],20.0%,100.0%
