Association rule learning is a type of unsupervised learning that identifies interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected.

The Apriori algorithm is a popular algorithm for association rule learning. The Apriori algorithm works by first finding all frequent item sets in the transaction database. It then uses a confidence measure to associate these item sets with each other. The confidence measure is a probability that two items will occur together in a transaction.

The Apriori algorithm works by using a candidate generation and candidate elimination process. In the candidate generation process, the Apriori algorithm generates all possible item sets of a given size. In the candidate elimination process, the Apriori algorithm checks each item set to see if it occurs frequently enough in the transaction database.

The Apriori algorithm has a number of advantages, including:

It is efficient in finding frequent item sets.
It is able to handle large datasets.
It is able to find association rules with high confidence.
The Apriori algorithm also has a number of disadvantages, including:

It can be computationally expensive to find all frequent item sets.
It can be difficult to interpret the results of association rule learning.
It can be sensitive to noise in the data.
Here is a dataset that can be used for association rule learning:

 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
 The code provided aims to mine association rules from a transaction dataset. The dataset consists of transactions recorded over a specific period of time and includes the following columns:

id: Represents the unique identifier for each transaction.
transaction_date: Indicates the date when the transaction occurred.
transaction_amount: Represents the amount spent in the transaction.
merchant_name: Specifies the name of the merchant where the transaction took place.
category: Represents the category or type of the transaction, such as "Groceries", "Coffee", "Streaming", etc.
The code performs the following steps:

It loads the transaction dataset from a CSV file.
Preprocesses the transaction data, converting it into a suitable format for further analysis.
Applies the Apriori algorithm to identify frequent itemsets, which are sets of items (in this case, categories) that appear together frequently in the transactions.
Generates association rules from the frequent itemsets. Association rules indicate relationships between different categories and provide insights into potential purchasing patterns.
Prints the discovered frequent itemsets and association rules for analysis and interpretation.
By running this code with the given dataset, you can uncover associations between different transaction categories, such as identifying which categories tend to occur together frequently or which categories are likely to be associated with specific merchants. This information can be valuable for various applications, including market basket analysis, recommendation systems, and targeted marketing strategies.

Import the necessary modules and functions (os, pandas, TransactionEncoder, apriori, association_rules) for data processing and association rule mining.


In [36]:
import os
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules


Get the current working directory using os.getcwd().


In [37]:
# Step 1: Get the current working directory
current_directory = os.getcwd()


Join the current directory path with the CSV file name to create the file path.


In [38]:
# Step 2: Join the current directory with the CSV file name
csv_file_path = os.path.join(current_directory, 'transaction_data.csv')


Load the CSV file using Pandas and store the data in the transaction_data variable.


In [40]:
# Step 3: Load the CSV file using Pandas
transaction_data = pd.read_csv(csv_file_path)


Convert the transaction data to a list format using .values.tolist(). This step prepares the data for further processing.


In [41]:
# Step 4: Preprocess the transaction data
transaction_list = transaction_data.values.tolist()


Convert each item in the transaction data to a string format. This is necessary to ensure all items are in a consistent format for the TransactionEncoder to handle.


In [42]:
# Step 5: Convert the transaction data to a string format
transaction_list = [[str(item) for item in transaction] for transaction in transaction_list]


Create an instance of TransactionEncoder and encode the transaction data into a binary matrix using te.fit_transform(). This step transforms the transaction data into a format suitable for the Apriori algorithm.


In [43]:
# Step 6: Encode the transaction data into a binary matrix
te = TransactionEncoder()
te_ary = te.fit_transform(transaction_list)
transaction_df = pd.DataFrame(te_ary, columns=te.columns_)


Apply the Apriori algorithm to find frequent itemsets using apriori(). Specify the minimum support threshold (min_support) and set use_colnames=True to use column names from the transaction data.

In [44]:
# Step 7: Apply the Apriori algorithm to find frequent itemsets
frequent_itemsets = apriori(transaction_df, min_support=0.1, use_colnames=True)


Generate association rules from the frequent itemsets using association_rules(). Specify the desired metric for rule evaluation (metric) and set the minimum threshold for the metric (min_threshold).

In [45]:
# Step 8: Generate association rules from the frequent itemsets
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

Print the frequent itemsets using print(frequent_itemsets).

In [46]:
# Step 9: Print the frequent itemsets and association rules
print("Frequent Itemsets:")
print(frequent_itemsets)

Frequent Itemsets:
     support                                itemsets
0        0.2                                     (1)
1        0.2                                    (10)
2        0.2                                   (100)
3        0.2                                     (2)
4        0.2                                    (20)
..       ...                                     ...
146      0.2  (Groceries, Walmart, 6/6/2023, 1, 100)
147      0.2     (6/6/2023, Gas, 4, Gas Station, 10)
148      0.2    (Coffee, 6/6/2023, Starbucks, 2, 50)
149      0.2   (6/6/2023, 3, Netflix, Streaming, 20)
150      0.2     (5, 6/6/2023, Shopping, Amazon, 30)

[151 rows x 2 columns]


Print the association rules using print(rules).

In [47]:
print("\nAssociation Rules:")
print(rules)


Association Rules:
      antecedents                       consequents  antecedent support  \
0             (1)                             (100)                 0.2   
1           (100)                               (1)                 0.2   
2             (1)                        (6/6/2023)                 0.2   
3     (Groceries)                               (1)                 0.2   
4             (1)                       (Groceries)                 0.2   
..            ...                               ...                 ...   
820  (Amazon, 30)           (5, Shopping, 6/6/2023)                 0.2   
821           (5)  (Amazon, Shopping, 30, 6/6/2023)                 0.2   
822    (Shopping)         (Amazon, 5, 30, 6/6/2023)                 0.2   
823      (Amazon)       (5, 30, Shopping, 6/6/2023)                 0.2   
824          (30)   (Amazon, 5, Shopping, 6/6/2023)                 0.2   

     consequent support  support  confidence  lift  leverage  conviction  
0   