## Market Basket Analysis

### Step 1: Importing the required libraries

In [None]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

### Step 2: Loading and exploring the data

In [None]:
# Load the dataset
file_path = 'groceries.csv'
df = pd.read_csv(file_path)

In [None]:
# Display the first few rows of the dataset
df.head(10)

In [None]:
# Checking for missing values
missing_values = df.isnull().sum()
print(missing_values)

In [None]:
df.shape

### Data Preparation for Market Basket Analysis

The following step is a critical phase in the Market Basket Analysis process, where the raw transactional data is transformed into a suitable format for extracting meaningful insights.

To do this, we must transform this data into a format suitable for seamless integration with the Apriori algorithm. Essentially, we aim to represent it in a tabular structure where ones and zeros denote the presence or absence of specific elements.

### Step 3: Converting the data into a suitable format for analysis

In [None]:
# 1. Split transaction strings (i.e., Items) into lists called transactions
transactions = df['Items'].apply(lambda t: t.split(','))

print(transactions)

In [None]:
# 2. Convert DataFrame column into list of strings
transactions = list(transactions)

### One-Hot Encoding and Apriori Algorithm
Now we apply the TransactionEncoder which converts item lists into transaction data for frequent itemset mining. That is, we convert the list to a One-Hot Encoded Boolean list

In [1]:
# Apply the Transaction Encoder
l = TransactionEncoder()

NameError: name 'TransactionEncoder' is not defined

The fit method of the TransactionEncoder learns the unique labels present in the dataset, and through the transform method, it converts the input dataset (a Python list of lists) into a NumPy boolean array using one-hot encoding.

In [None]:
l_data = l.fit(transactions).transform(transactions)

Convert the encoded array into a pandas DataFrame:

In [None]:
df = pd.DataFrame(l_data,columns=l.columns_)
df = df.replace(False,0)
df

In [None]:
# Apply Apriori algorithm to find frequent itemsets
# Set a threshold value for the support value
df = apriori(df, min_support = 0.01, use_colnames = True, verbose = 1)
df

In [None]:
#Let's view our interpretation values using the Associan rule function.
df_ar = association_rules(df, metric = "confidence", min_threshold = 0.7)
df_ar

The resulting table shows that the four most popular product combinations that are frequently bought together are:
- cereals and whole milk
- margarine, rolls/buns and	whole milk
- root vegetables, sausage	and rolls/buns
- Fruit/vegetable juice, whole milk and	yogurt

For example, if we take a look at our 1st index value:
- 80% of those who buys frozen dessert, buys whole milk as well.
- Their correlation with each other is seen as 3.7.