# CS145 Introduction to Data Mining - Assignment 4  
**Deadline: 11:59PM, May 14, 2025**

## Instructions
Each assignment is structured as a Jupyter notebook, offering interactive tutorials that align with our lectures. You will encounter two types of problems: *write-up problems* and *coding problems*.

1. **Write-up Problems:** These problems are primarily theoretical, requiring you to demonstrate your understanding of lecture concepts and to provide mathematical proofs for key theorems. Your answers should include sufficient steps for the mathematical derivations.
2. **Coding Problems:** Here, you will be engaging with practical coding tasks. These may involve completing code segments provided in the notebooks or developing models from scratch.

To ensure clarity and consistency in your submissions, please adhere to the following guidelines:

* For write-up problems, use Markdown bullet points to format text answers. Also, express all mathematical equations using $\LaTeX$ and avoid plain text such as `x0`, `x^1`, or `R x Q` for equations.
* For coding problems, comment on your code thoroughly for readability and ensure your code is executable. Non-runnable code may lead to a loss of **all** points. Coding problems have automated grading, and altering the grading code will result in a deduction of **all** points.
* Your submission should show the entire process of data loading, preprocessing, model implementation, training, and result analysis. This can be achieved through a mix of explanatory text cells, inline comments, intermediate result displays, and experimental visualizations.

### Submission Requirements

* Submit your solutions through GradeScope in BruinLearn.
* Late submissions are allowed up to 24 hours post-deadline with a penalty factor of $\mathbf{1}(t \le 24)e^{-(\ln(2)/12)t}$.

### Collaboration and Integrity

* Collaboration is encouraged, but all final submissions must be your own work. Please acknowledge any collaboration or external sources used, including websites, papers, and GitHub repositories.
* Any suspicious cases of academic misconduct will be reported to The Office of the Dean of Students.

---

## Outline

- **Part 1: Write-up**

  1. [EM on GMM (Proof Question)](#writeup-q1)

  2. [Set Data (By-Hand Question)](#writeup-q2)

  3. [CNN Convolution (By-Hand Calculation)](#writeup-q3)

  4. [PrefixSpan Question](#writeup-q4)

  5. [Sequence Alignment Question](#writeup-q5)

- **Part 2: Coding**

  6. [Gaussian Mixture Model on Real Data](#coding-q4)

  7. [Implementing the Apriori Algorithm](#coding-q5)

  8. [Implementing a Convolutional Neural Network (CNN)](#coding-q6)



---

# Part 1: Write-up

<a name="writeup-q1"></a>
## 1) EM Derivations for a Gaussian Mixture Model (20 points)

Consider a Gaussian Mixture Model (GMM) with $K$ components, where each component $k$ has parameters $\pi_k, \mu_k, \Sigma_k$. Let the **posterior probability** (responsibility) be:
$$
\gamma_{nk} = p(z_n = k \mid x_n)
= \frac{\pi_k \mathcal{N}(x_n \mid \mu_k, \Sigma_k)}
     {\sum_{k'=1}^{K} \pi_{k'} \mathcal{N}(x_n \mid \mu_{k'}, \Sigma_{k'})},
$$
where
$$
\mathcal{N}(x \mid \mu, \Sigma)
= (2\pi)^{-d/2}
  |\Sigma|^{-1/2}
  \exp\left\{-\frac{1}{2}(x - \mu)^\top \Sigma^{-1} (x - \mu)\right\}.
$$

**Question**: Prove the following maximum likelihood estimates (MLE) are obtained for a GMM with soft assignments:
$$
\pi_k = \frac{\sum_n \gamma_{nk}}{\sum_{k}\sum_n \gamma_{nk}},
\quad
\mu_k = \frac{\sum_n \gamma_{nk} x_n}{\sum_n \gamma_{nk}},
\quad
\Sigma_k = \frac{\sum_n \gamma_{nk} (x_n - \mu_k)(x_n - \mu_k)^\top}{\sum_n \gamma_{nk}}.
$$

In your **proof**, please:

1. Start from the **complete-data log-likelihood** expression (including the assignments $z_n$). (6 points)
2. Show that maximizing w.r.t. each $\pi_k, \mu_k, \Sigma_k$ yields the above formulas when $\gamma_{nk} = p(z_n = k \mid x_n)$. (10 points)
3. Include **sufficient intermediate steps** in your derivation (e.g., partial derivatives, normalizing constraints). (4 points)

**Hint**: You may use standard results for maximizing Gaussian likelihoods, but do show how the soft assignments $\gamma_{nk}$ appear in place of the usual indicator variables.


**[TODO: Write your responses here. ]**

**[We have walked through this question during the discussion session. Refer to the discussion recording if you need to revisit it. ]**


---

<a name="writeup-q2"></a>
## 2) Set Data (By-Hand Question) (20 points)

We have the following **5 transactions**:

| TID | Items                   |
|----:|:------------------------|
| 10  | Beer, Nuts, Diaper     |
| 20  | Beer, Coffee, Diaper   |
| 30  | Beer, Diaper, Eggs     |
| 40  | Nuts, Eggs, Milk       |
| 50  | Nuts, Coffee, Diaper, Eggs, Milk |

Assume $\mathrm{minsup} = 50\%$ (i.e., itemsets must appear in at least 50% of transactions).

1. **Frequent 1-Itemsets** (5 points):  
   - Count each individual item's frequency (absolute and relative). Which items are **frequent**?

2. **Candidate 2-Itemsets** (7 points):  
   - Generate all 2-itemset candidates and prune those not meeting $\mathrm{minsup}$. Show your manual support counting.

3. **3-Itemsets** (5 points):  
   - For completeness, if any 3-itemset can be frequent under $\mathrm{minsup} = 50\%$, list them.

4. **Brief Commentary** (3 points):  
   - How many database scans did you perform by hand?  
   - Could you see any **shortcuts** (like the Apriori property) that saved you from enumerating everything?


**[TODO: Write your responses here. ]**


---

<a name="writeup-q3"></a>
## 3) CNN Convolution (By-Hand Calculation) (20 points)

Consider a **single-channel** (grayscale) 5 $\times$ 5 input image $I$ and a **single** 3 $\times$ 3 filter $F$. Let the (row, column)-indexed pixels in $I$ be:

$$
I = \begin{bmatrix}
1 & 2 & 3 & 2 & 1 \\
2 & 3 & 4 & 3 & 2 \\
3 & 4 & 5 & 4 & 3 \\
2 & 3 & 4 & 3 & 2 \\
1 & 2 & 3 & 2 & 1
\end{bmatrix},
\quad
F = \begin{bmatrix}
1 & 0 & -1 \\
0 & 0 & 0 \\
-1 & 0 & 1
\end{bmatrix}.
$$

We will perform a **valid convolution** (no padding), with **stride = 1**.

**Task**:  
1. Write the formula for the convolution output $O(r,c)$ for a 2D input and kernel (5 points).  
2. Calculate **one** output cell in detail, e.g. $O(1,1)$ (using 1-based indexing for convenience). Show all multiplications and summations (7 points).  
3. Provide the final 3 $\times$ 3 output (8 points).  


**[TODO: Write your responses here. ]**

---

<a name="writeup-q4"></a>
## 4) PrefixSpan Question (20 points)

**Given** the following small sequence database (with `SID` as sequence IDs), and a minimum support threshold of 2 (i.e., a subsequence must appear in at least 2 sequences to be considered frequent):

| SID | Sequence                      |
|----:|:------------------------------|
|  1  | `<(ab) c (ac) b>`            |
|  2  | `<(a) (bc) (ab)>`            |
|  3  | `<(ab) a (bc) (ac)>`         |
|  4  | `< b (ac) (ab) c>`           |

Each element is shown in parentheses (e.g., `(ab)`), and within an element, items are unordered. For instance, `(ab)` is the same as `(ba)`.

### Tasks:

1. **Frequent Single-Item Sequences** (5 points)
   - Identify all length-1 (single-item) subsequences that meet the minimum support of 2.  
   - List their support counts.

2. **Prefix Projection** (7 points)
   - Pick **one** frequent single-item prefix (e.g., `<a>` or `<b>` — whichever is frequent) and construct its **projected database**. Show **how** you derive these projected sequences (i.e., how you remove the prefix and keep the remainder as the “suffix”).

3. **Frequent 2-Item Sequences** (8 points)
   - Using the projected database from step (2), find **all** possible 2-item extensions of that prefix that are still frequent.  
   - You do *not* need to enumerate every possible prefix in the entire database. Focus on demonstrating the prefix-projection mechanism clearly for **one** prefix.


**[TODO: Write your responses here. ]**

---

<a name="writeup-q5"></a>
## 5) Sequence Alignment Question (20 points)

**Given** two DNA sequences:

$$
X = \text{`GCATGCG`}
$$
$$
Y = \text{`CATTAGA`}
$$

Use the *Needleman-Wunsch* algorithm (global sequence alignment via dynamic programming) with the following scoring scheme:

- **Match**: +1  
- **Mismatch**: -1  
- **Gap**: -1  

(You may use any table size or approach from the lecture notes.)

### Tasks:

1. **Fill Out the DP Table** (8 points)
   - Construct an $(m+1) \times (n+1)$ matrix (where $m$ and $n$ are the lengths of $X$ and $Y$, respectively).  
   - Show how you compute each cell $F(i,j)$ by taking the maximum of:
     1. $F(i-1, j) + (\text{gap})$,  
     2. $F(i, j-1) + (\text{gap})$,  
     3. $F(i-1, j-1) + s(x_i, y_j)$,  
     where $s(x_i, y_j)$ is +1 if $x_i$ and $y_j$ match, and -1 otherwise.

2. **Backtracking** (7 points)
   - Once the table is completed, trace **back** from the bottom-right corner to retrieve **one optimal alignment** of $X$ and $Y$. Show your resulting alignment in a readable form (e.g., with dashes for gaps).

3. **Final Alignment & Score** (5 points)
   - Report the final alignment and the **optimal alignment score** $F(m,n)$.


**[TODO: Write your responses here. ]**

---

# Part 2: Coding

Below are three coding assignments. You can implement them in a single Jupyter notebook or separate ones. **Please include** any textual explanations and visualizations (e.g., plots, confusion matrices) within Markdown cells in your notebook for clarity.

<a name="coding-q4"></a>
## 6) Gaussian Mixture Model on Real Data (20 points)

You will implement (or use a library for) a **Gaussian Mixture Model (GMM)** on the "make_moons" dataset from scikit-learn.

**Tasks**:
1. **Data Loading & Preprocessing** (3 points):
   - Generate the "make_moons" dataset using `sklearn.datasets.make_moons`.
   - The dataset creates two interleaving half-moon shapes, ideal for clustering visualizations.
   - Use parameters: `n_samples=300`, `noise=0.1`, `random_state=42`.
   - Optionally normalize or standardize features.

2. **Model Training** (8 points):
   - Either implement GMM from scratch (using E-step & M-step) **or** use an existing library (e.g., `sklearn.mixture.GaussianMixture`).
   - Try different numbers of components $K$ (e.g., $K=2,3,4$).
   - Experiment with different covariance types ('full', 'tied', 'diagonal', 'spherical').

3. **Analysis & Visualization** (5 points):
   - Plot the data with cluster responsibilities or predicted labels.
   - Visualize cluster boundaries using contour plots.
   - Print or plot the means $\mu_k$ and mixture weights $\pi_k$.

4. **Discussion** (4 points):
   - How did you pick the optimal $K$?  
   - How well does GMM handle the non-Gaussian moon-shaped clusters?
   - Compare the performance of different covariance types.

**Starter Code with TODO Blocks**:


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.mixture import GaussianMixture
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import adjusted_rand_score
import matplotlib.colors as colors

# Function to create a mesh grid for visualizing decision boundaries
def plot_decision_boundaries(X, model, ax=None):
    if ax is None:
        ax = plt.gca()

    # Create a mesh grid
    h = 0.02  # step size in the mesh
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

    # TODO: Predict labels for each point in mesh and visualize decision boundaries
    # Hint: Use model.predict() on the mesh grid points and reshape to match xx shape

    return ax

# Generate the make_moons dataset
X, y_true = make_moons(n_samples=300, noise=0.1, random_state=42)

# TODO: Preprocess the data (optional but recommended)
# Apply StandardScaler for better GMM performance
# scaler = StandardScaler()
# X_scaled = scaler.fit_transform(X)
# X = X_scaled  # Use scaled data

# Set parameters for model comparison
covariance_types = ['full', 'tied', 'diag', 'spherical']
n_components_list = [2, 3, 4]

# TODO: Set up a figure for the plots
# plt.figure(figsize=(15, 12))

# TODO: Create and train multiple GMM models with different configurations
# Loop through covariance types and number of components
# Keep track of the best model based on adjusted_rand_score
# best_ari = -1
# best_model = None
# best_config = None

# TODO: For each configuration:
# 1. Create and train a GMM model
# 2. Get predictions
# 3. Calculate metrics (ARI, BIC, AIC)
# 4. Plot results
# 5. Update best model if needed

# TODO: Print details of the best model

# TODO: Create a detailed visualization of the best model

# TODO: Provide analysis of the results
# 1. Discuss which K value worked best and why
# 2. Explain how GMM handles the non-Gaussian moon shapes
# 3. Compare different covariance types

**[TODO: Write your responses here.]**


---

<a name="coding-q5"></a>
## 7) Implementing the Apriori Algorithm (20 points)

Implement the **Apriori algorithm** for frequent itemset mining on the "Bakery" dataset.

**Tasks**:
1. **Load the dataset** (3 points):
   - The "Bakery" dataset contains 1,000 transactions from a bakery shop, providing a manageable size for this assignment.
   - You can download it from this URL: https://raw.githubusercontent.com/ngjiawaie/Extended_Bakery_Dataset/master/1000i.csv
   - Each row represents a transaction with three columns: Transaction ID, Item, and Quantity.

2. **Implement Apriori** from scratch (10 points):
   - Generate 1-itemsets and count their frequencies
   - For each k > 1, generate candidate k-itemsets from (k-1)-itemsets
   - Prune candidates using the Apriori property
   - Calculate support for remaining candidates
   - Continue until no frequent itemsets are found

3. **Output** (4 points):
   - Print the **frequent itemsets** discovered (above your chosen minimum support threshold)
   - Generate **association rules** with their confidence/lift metrics

4. **Comment** (3 points):
   - Analyze the most interesting rules you discovered
   - Explain how the minimum support threshold affects your results

**Starter Code (Apriori)**:

In [None]:
import pandas as pd
import numpy as np
from collections import defaultdict
from itertools import combinations
import matplotlib.pyplot as plt
import seaborn as sns
import requests
from io import StringIO

# Load the Bakery dataset
def load_bakery_dataset(filepath):
    """
    Load the Bakery dataset from a CSV file.
    Returns a list of transactions, where each transaction is a list of items.
    """
    try:
        # Try to load from local file first
        df = pd.read_csv(filepath, header=None, names=['TransactionID', 'Item', 'Quantity'])
    except FileNotFoundError:
        # If file not found, download it
        print(f"File not found: {filepath}. Attempting to download...")
        url = "https://raw.githubusercontent.com/ngjiawaie/Extended_Bakery_Dataset/master/1000i.csv"
        response = requests.get(url)
        if response.status_code == 200:
            content = StringIO(response.text)
            df = pd.read_csv(content, header=None, names=['TransactionID', 'Item', 'Quantity'])
            print(f"Dataset downloaded successfully.")
            # Save the file locally for future use
            with open(filepath, 'w') as f:
                f.write(response.text)
        else:
            raise Exception(f"Failed to download dataset: Status code {response.status_code}")

    # Group by TransactionID and collect items into lists
    transactions = df.groupby('TransactionID')['Item'].apply(list).tolist()

    return transactions, df

def generate_candidates(frequent_itemsets_k_minus_1, k):
    """
    Generate candidate k-itemsets from frequent (k-1)-itemsets
    """
    candidates = set()

    # TODO: Implement candidate generation
    # For k=2, generate pairs from individual items
    # For k>2, use the apriori principle: Two (k-1)-itemsets can be joined if they share k-2 items

    return candidates

def prune_candidates(candidates, frequent_itemsets_k_minus_1, k):
    """
    Prune candidate k-itemsets using the Apriori property:
    All subsets of a frequent itemset must also be frequent
    """
    pruned_candidates = set()

    # TODO: Implement candidate pruning
    # Generate all (k-1)-sized subsets of candidates
    # Keep candidate only if all its (k-1)-subsets are frequent

    return pruned_candidates

def count_itemsets_support(transactions, candidates):
    """
    Count the support for each candidate itemset
    """
    support_count = defaultdict(int)

    # TODO: Implement support counting
    # Count occurrences of each candidate in all transactions

    return support_count

def apriori(transactions, min_support):
    """
    Implement the Apriori algorithm

    Parameters:
    - transactions: List of transactions, where each transaction is a list of items
    - min_support: Minimum support threshold (between 0 and 1)

    Returns:
    - A dictionary of frequent itemsets with their support values
    """
    # Get unique items in the dataset
    unique_items = set()
    for transaction in transactions:
        for item in transaction:
            unique_items.add(item)

    print(f"Dataset has {len(transactions)} transactions with {len(unique_items)} unique items.")

    # Generate 1-itemsets
    itemsets = {frozenset([item]): 0 for item in unique_items}

    # TODO: Count support for 1-itemsets
    # Iterate through transactions and count occurrences of each item

    # Get frequent 1-itemsets
    min_count = min_support * len(transactions)
    frequent_itemsets = {k: v for k, v in itemsets.items() if v >= min_count}

    # TODO: Implement the main Apriori loop
    # Start with k=2 and repeat until no more frequent itemsets are found:
    # 1. Generate k-itemset candidates
    # 2. Prune candidates
    # 3. Count support
    # 4. Filter to get frequent k-itemsets
    # 5. Update result and increment k

    # Track statistics for visualization
    result = dict(frequent_itemsets)  # Initialize with 1-itemsets
    stats = {
        'k': [1],
        'candidates': [len(itemsets)],
        'frequent': [len(frequent_itemsets)]
    }

    return result, stats

def generate_association_rules(frequent_itemsets, transactions, min_confidence, min_lift=1.0):
    """
    Generate association rules from frequent itemsets

    Parameters:
    - frequent_itemsets: Dictionary of frequent itemsets with their support
    - transactions: List of transactions
    - min_confidence: Minimum confidence threshold (between 0 and 1)
    - min_lift: Minimum lift threshold (greater than or equal to 1.0)

    Returns:
    - List of rules as tuples (antecedent, consequent, support, confidence, lift)
    """
    rules = []
    total_transactions = len(transactions)

    # TODO: Implement association rule generation
    # 1. Consider only itemsets with at least 2 items
    # 2. For each itemset, generate all possible non-empty proper subsets as antecedents
    # 3. Calculate support, confidence, and lift for each rule
    # 4. Filter rules based on min_confidence and min_lift
    # 5. Sort rules by lift (descending)

    return rules

def visualize_results(stats, frequent_itemsets, rules):
    """
    Visualize the results of the Apriori algorithm
    """
    # Create a figure with subplots
    plt.figure(figsize=(15, 7))

    # TODO: Implement visualization
    # 1. Plot statistics (candidates and frequent itemsets vs k)
    # 2. Plot top frequent items
    # 3. Visualize rules if any were found

    plt.tight_layout()
    plt.savefig('apriori_visualization.png')
    plt.show()

def main():
    # Load the Bakery dataset
    print("Loading Bakery dataset...")
    transactions, bakery_df = load_bakery_dataset("1000i.csv")

    # Set parameters
    min_support = 0.02  # 2% minimum support
    min_confidence = 0.5  # 50% minimum confidence
    min_lift = 1.1  # Minimum lift threshold

    # Run Apriori algorithm
    print(f"\nRunning Apriori algorithm with min_support={min_support}...")
    frequent_itemsets, stats_df = apriori(transactions, min_support)

    # Print frequent itemsets
    print("\nFrequent Itemsets:")
    print(f"Found {len(frequent_itemsets)} frequent itemsets in total")

    # Generate association rules
    print(f"\nGenerating association rules with min_confidence={min_confidence}, min_lift={min_lift}...")
    rules = generate_association_rules(frequent_itemsets, transactions, min_confidence, min_lift)

    # Visualize results
    print("\nGenerating visualizations...")
    visualize_results(stats_df, frequent_itemsets, rules)

    # Add your analysis of the results
    print("\n----- Your Analysis Goes Here -----")
    # TODO: Add analysis of frequent itemsets and rules

if __name__ == "__main__":
    main()


**[TODO: Write your responses here. ]**


---

<a name="coding-q6"></a>
## 8) Implementing a Convolutional Neural Network (CNN) (20 points)

Use **PyTorch** (or another deep learning framework) to build a **simple CNN** for classification on a small image dataset (e.g., MNIST-like, CIFAR-10 subset, or any small custom dataset).

**Tasks**:
1. **Data Loading** (3 points):
   - Download or load a small dataset of images (e.g., from `torchvision.datasets`).
   - Split into train and test sets.

2. **Model Definition** (7 points):
   - Define a CNN with at least **one convolutional layer**, one pooling layer, and one fully connected layer at the end.

3. **Training & Evaluation** (7 points):
   - Train for a few epochs, print out training loss.
   - Evaluate on a test set, print out accuracy.

4. **Discussion** (3 points):
   - Did your CNN overfit on a small dataset?
   - (Optional) Experiment with more layers or data augmentation.

**Starter Code**:

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import numpy as np
import time

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check if CUDA is available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()

        # TODO: Define your CNN architecture
        # 1. Add at least one convolutional layer
        # 2. Add at least one pooling layer
        # 3. Add at least one fully connected layer
        # Remember to specify input/output dimensions appropriate for your dataset

    def forward(self, x):
        # TODO: Implement the forward pass
        # Connect the layers defined in __init__
        pass

def load_and_prepare_data():
    """
    Load and prepare the MNIST dataset
    """
    # TODO: Define data transformations and load dataset
    # 1. Set up appropriate transforms (ToTensor, Normalize, etc.)
    # 2. Load training and test datasets
    # 3. Create data loaders with appropriate batch sizes
    # 4. Get a batch of examples for visualization

    return train_loader, test_loader, example_data, example_targets

def visualize_data(example_data, example_targets):
    """
    Visualize sample images from the dataset
    """
    # TODO: Create a plot to visualize sample images with their labels
    pass

def train_model(model, train_loader, test_loader, num_epochs=5):
    """
    Train the CNN model
    """
    # TODO: Implement model training
    # 1. Define loss function and optimizer
    # 2. Set up training loop with epochs and batches
    # 3. In each epoch:
    #    - Train the model (forward, loss, backward, optimize)
    #    - Evaluate on test set
    #    - Track metrics
    # 4. Return training statistics

    return train_losses, test_losses, accuracies

def evaluate_model(model, test_loader):
    """
    Evaluate the model on the test set
    """
    # TODO: Implement model evaluation
    # 1. Set model to evaluation mode
    # 2. Calculate loss and accuracy on test set
    # 3. Return metrics

    return test_loss, accuracy

def visualize_results(train_losses, test_losses, accuracies):
    """
    Visualize training results
    """
    # TODO: Create plots to show:
    # 1. Training and test loss over epochs
    # 2. Test accuracy over epochs
    pass

def main():
    # Load and visualize data
    train_loader, test_loader, example_data, example_targets = load_and_prepare_data()
    visualize_data(example_data, example_targets)

    # Create the model
    model = SimpleCNN().to(device)
    print(model)

    # Train the model
    train_losses, test_losses, accuracies = train_model(model, train_loader, test_loader)

    # Visualize training results
    visualize_results(train_losses, test_losses, accuracies)

    # Final evaluation
    final_loss, final_accuracy = evaluate_model(model, test_loader)
    print(f"\nFinal accuracy on test set: {final_accuracy:.2f}%")

    # Discussion of results
    print("\n----- Discussion -----")
    # TODO: Add your discussion here addressing:
    # 1. Model architecture choice
    # 2. Analysis of results (accuracy, overfitting)
    # 3. Potential improvements

if __name__ == "__main__":
    main()



**[TODO: Write your responses here. ]**