# FP-Growth Algorithm in PySpark

This notebook demonstrates how to use the **FP-Growth** algorithm in PySpark for frequent pattern mining and association rule generation.

## Step 1: Set Up Spark Session
We first create a Spark session to work with PySpark.

In [None]:
from pyspark.sql import SparkSession

# Create Spark session
spark = SparkSession.builder \
    .appName("FP-Growth Example") \
    .getOrCreate()

## Step 2: Load and Prepare Data
We create a sample dataset where each transaction is represented as a list of items.

In [None]:
# Create sample transaction data
data = [
    (0, ["milk", "bread", "butter"]),
    (1, ["milk", "bread"]),
    (2, ["bread", "butter"]),
    (3, ["milk", "butter"]),
    (4, ["bread"])
]

# Convert to DataFrame
df = spark.createDataFrame(data, ["id", "items"])
df.show()

## Step 3: Apply FP-Growth Algorithm
We use PySpark's `FPGrowth` class to find frequent itemsets and generate association rules.

In [None]:
from pyspark.ml.fpm import FPGrowth

# Initialize FP-Growth model
fpGrowth = FPGrowth(itemsCol="items", minSupport=0.4, minConfidence=0.6)

# Train the model
model = fpGrowth.fit(df)

## Step 4: Display Frequent Itemsets
We display all the frequent itemsets found by the algorithm that meet the minimum support threshold.

In [None]:
# Show frequent itemsets
model.freqItemsets.show()

## Step 5: Generate Association Rules
Association rules are generated for itemsets that meet the minimum confidence threshold.

In [None]:
# Show association rules
model.associationRules.show()

## Step 6: Make Predictions
Use the model to predict the frequent itemsets in new transactions.

In [None]:
# Predict frequent itemsets for new data
new_data = spark.createDataFrame([
    (5, ["milk", "bread"]),
    (6, ["butter", "bread"])
], ["id", "items"])

predictions = model.transform(new_data)
predictions.show()

## Step 7: Stop the Spark Session
Finally, we stop the Spark session to release resources.

In [None]:
# Stop Spark session
spark.stop()