# Model-Based Collaborative Filtering: Matrix Factorization with SVD

In this notebook, we implement a **model-based collaborative filtering** approach using **Matrix Factorization**. Unlike memory-based approaches, model-based methods involve a training phase to learn representations for users and items, enabling more scalable and generalizable recommendation systems.

---

## Memory-Based vs. Model-Based Collaborative Filtering

### Memory-Based Collaborative Filtering
- **Core Idea**: Relies on direct relationships in the data, such as user-user or item-item similarity.
- **How It Works**:
  - User-Based CF: Recommends items by finding similar users and aggregating their preferences.
  - Item-Based CF: Recommends items by finding similar items based on user interactions.
- **Key Characteristics**:
  - No explicit training phase.
  - Predictions are made on-the-fly by aggregating observed data.

### Model-Based Collaborative Filtering
- **Core Idea**: Builds a predictive model by learning latent representations of users and items.
- **How It Works**:
  - Decomposes the User-Item Interaction Matrix into low-dimensional representations (embeddings) for users and items.
  - Uses these embeddings to predict interactions between users and items.
- **Key Characteristics**:
  - Requires a training phase to learn the model parameters.
  - Generalizes to unobserved user-item pairs by capturing latent patterns.

### Key Differences Between Memory-Based and Model-Based Approaches

| **Aspect**            | **Memory-Based CF**                              | **Model-Based CF**                              |
|------------------------|--------------------------------------------------|------------------------------------------------|
| **Computation**        | On-the-fly; computes similarities dynamically.   | Pre-computes embeddings; predictions are faster. |
| **Training**           | No explicit training phase.                      | Explicit training phase to learn latent factors. |
| **Generalization**     | Limited to observed interactions.                | Generalizes well to unobserved interactions.     |
| **Sparsity Handling**  | Struggles with sparse data.                      | Learns latent patterns to handle sparsity better. |
| **Scalability**        | Limited scalability with large datasets.         | More scalable, though training can be costly.    |
| **Inference**          | Requires entire dataset for predictions.         | Can make predictions using pre-trained embeddings. |

---

## Matrix Factorization: The Core of Model-Based CF

### What is Matrix Factorization?
Matrix Factorization (MF) is a model-based approach that approximates the **User-Item Interaction Matrix** as the product of two smaller matrices:
- **User Latent Matrix** ($U$)): Represents users in a latent feature space.
- **Item Latent Matrix** ($V$): Represents items in the same latent feature space.

### Mathematical Explanation
Given a User-Item Interaction Matrix $ R $, we approximate it as:

$$
[R \approx U \cdot V^T]
$$

Where:
- $ R $: Original interaction matrix ($n_{users} \times n_{items}$).
- $ U $: User latent feature matrix ($n_{users} \times k$).
- $ V $: Item latent feature matrix ($n_{items} \times k$).
- $ k $: Number of latent features (hyperparameter).

The goal is to minimize the difference between the observed interactions in $ R $ and the predicted interactions from $ U \cdot V^T $.

### Loss Function
The optimization problem is defined as:

$$
[\text{Loss} = \sum_{(u,i) \in R} \left( R_{u,i} - U_u \cdot V_i^T \right)^2 + \lambda \left( ||U||^2 + ||V||^2 \right)]
$$

Where:
- First term: Squared error between actual and predicted interactions.
- Second term: Regularization term to prevent overfitting.
- $ \lambda $: Regularization hyperparameter.

### Optimization Techniques
Matrix Factorization models are trained using optimization techniques like:
1. **Stochastic Gradient Descent (SGD)**: Iteratively updates $ U $ and $ V $ to minimize the loss function.
2. **Alternating Least Squares (ALS)**: Alternates between fixing $ U $ and $ V $, solving one matrix at a time.

---

## SVD (Singular Value Decomposition) for Matrix Factorization

### What is SVD?
SVD is a variant of Matrix Factorization that decomposes the User-Item Interaction Matrix into three matrices:

$$
[R = U \cdot \Sigma \cdot V^T]
$$

Where:
- $ U $: Orthogonal matrix representing users.
- $ \Sigma $: Diagonal matrix of singular values, representing the strength of latent features.
- $ V $: Orthogonal matrix representing items.

### Simplification for Recommendations
In practice, we truncate $ \Sigma $ to retain only the top $ k $ singular values (dimensionality reduction):

$$
[R \approx U_k \cdot \Sigma_k \cdot V_k^T]
$$

Where:
- $ U_k $: Top $ k $ user features.
- $ \Sigma_k $: Top $ k $ singular values.
- $ V_k $: Top $ k $ item features.

### Why SVD?
- Captures latent user-item relationships effectively.
- Handles sparsity by focusing on the most significant latent features.
- Provides a mathematically robust way to factorize matrices.

---

## Why Use Matrix Factorization?

### Advantages
1. **Handles Sparse Data**: Learns latent patterns, making it robust to missing interactions.
2. **Scalable**: Efficient for large datasets when implemented with optimized libraries.
3. **Better Generalization**: Predicts unobserved interactions effectively.
4. **Customizable**: Can be extended with biases, side information, or implicit feedback.

### Challenges
1. **Cold-Start Problem**: Still struggles with new users or items.
2. **Hyperparameter Tuning**: Number of latent features (\(k\)) and regularization parameter (\(\lambda\)) must be carefully tuned.
3. **Interpretability**: Latent features are not always interpretable.

---

## Next Steps

In this notebook, we will:
1. Implement **Matrix Factorization** for recommendation using the **SVD variant**.
2. Train the model to learn user and item latent features.
3. Evaluate its performance using metrics like **Precision@K** and **Recall@K**.

By the end, we will have a scalable and generalizable recommendation system ready for further enhancements.


# Understanding Matrix Factorization: Learning vs. Calculation

In Matrix Factorization for recommendation systems, the factors (user and item latent matrices) are typically **learned through an iterative optimization process** rather than being calculated deterministically. This distinction arises from the nature of real-world data and the goals of recommendation systems.

---

## Why Are Factors Learned Instead of Calculated?

### 1. Traditional SVD: Deterministic Factorization
- **Singular Value Decomposition (SVD)** is a **deterministic mathematical method** used for matrix factorization.
- Given a complete and dense matrix $R$, SVD decomposes it into three matrices:
  $$
  R = U \cdot \Sigma \cdot V^T
  $$
  - $U$: Orthogonal matrix representing users.
  - $\Sigma$: Diagonal matrix of singular values.
  - $V$: Orthogonal matrix representing items.
- SVD provides a precise decomposition but requires $R$ to be fully populated (no missing entries).

---

### 2. Challenges with Real-World Data
- In recommendation systems, the User-Item Interaction Matrix $R$ is typically **sparse** and **incomplete**:
  - Most entries are missing because users interact with only a small subset of items.
  - Deterministic SVD cannot handle missing entries directly.

---

### 3. Learning the Factors
To handle sparsity and incompleteness, **Matrix Factorization** uses an **iterative learning process** to approximate the factors:
$$
R \approx U \cdot V^T
$$
- $U$: User latent feature matrix.
- $V$: Item latent feature matrix.
- The goal is to minimize the error between the observed interactions in $R$ and the predicted interactions from $U \cdot V^T$.

---

## How Factors Are Learned: Iterative Optimization

### 1. Objective Function
The factors are learned by minimizing an objective function that measures the error between the observed and predicted values:
$$
\text{Loss} = \sum_{(u,i) \in R} \left( R_{u,i} - U_u \cdot V_i^T \right)^2 + \lambda \left( ||U||^2 + ||V||^2 \right)
$$
- First term: Squared error for observed entries in $R$.
- Second term: Regularization to prevent overfitting.
- $\lambda$: Regularization parameter.

---

### 2. Optimization Techniques
To minimize this loss function, the following iterative optimization algorithms are commonly used:

#### **Stochastic Gradient Descent (SGD)**
- Iteratively updates the latent factors $U$ and $V$ to reduce the loss.
- Gradient updates:
  $$
  U_u \gets U_u + \eta \cdot \frac{\partial \text{Loss}}{\partial U_u}
  $$
  $$
  V_i \gets V_i + \eta \cdot \frac{\partial \text{Loss}}{\partial V_i}
  $$
  - $\eta$: Learning rate.

#### **Alternating Least Squares (ALS)**
- Alternates between fixing $U$ and optimizing $V$, then fixing $V$ and optimizing $U$.
- Solves least-squares problems iteratively until convergence.

---

### 3. Iterative Nature of Learning
- The optimization process starts with random initial values for $U$ and $V$.
- Each iteration adjusts the values to reduce the loss.
- The process stops when the error converges below a threshold or after a fixed number of iterations.

---

## Key Differences Between Deterministic and Learned Factorization

| **Aspect**                | **Traditional SVD (Deterministic)** | **Matrix Factorization for Recommendations (Learned)** |
|---------------------------|------------------------------------|--------------------------------------------------------|
| **Input Matrix**           | Complete and dense.               | Sparse and incomplete (missing entries).               |
| **Output**                 | Exact decomposition.              | Approximation of the matrix.                           |
| **Method**                 | Mathematical, closed-form.        | Iterative optimization (e.g., SGD, ALS).               |
| **Handling Missing Data**  | Cannot handle missing entries.    | Explicitly designed to work with missing entries.       |
| **Scalability**            | Limited for very large datasets.  | Scalable with distributed algorithms like ALS.          |

---

## Advantages of Learning Factors
1. **Handles Sparse Data**: Works effectively with incomplete matrices, focusing only on observed values during training.
2. **Flexibility**: Allows for the inclusion of constraints (e.g., regularization) and side information (e.g., implicit feedback).
3. **Scalability**: Can handle very large datasets using distributed or parallelized optimization techniques (e.g., ALS in Spark).

---

## Conclusion
In recommendation systems, **factorization is learned** through iterative optimization because real-world interaction data is sparse and incomplete. This approach approximates the User-Item Interaction Matrix $R$ and uncovers latent relationships between users and items. Deterministic methods like traditional SVD are unsuitable for sparse data, making learning-based Matrix Factorization the method of choice.

---


# Implementation Steps: Matrix Factorization for Recommendation

## Step 2: Import Libraries and Load Dataset
1. Import necessary libraries for data manipulation and matrix operations (e.g., Pandas, NumPy, etc.).
2. Load the dataset (e.g., MovieLens) containing user-item interactions (ratings).
3. Perform initial inspection of the dataset to understand its structure (e.g., columns for `userId`, `movieId`, `rating`, and potentially `timestamp`).

---

## Step 3: Data Preprocessing
1. Prepare the **User-Item Interaction Matrix**:
   - Create a pivot table with users as rows, items as columns, and ratings as values.
2. Handle missing values:
   - Fill missing entries with `0` (explicit feedback) or `NaN` (implicit feedback, to be handled differently in training).
3. Normalize the data:
   - Optionally subtract the mean rating of each user to center the data.

---

## Step 4: Model Implementation (Matrix Factorization)
1. Define the model parameters:
   - Number of latent features ($k$).
   - Learning rate ($\eta$) for optimization.
   - Regularization parameter ($\lambda$) to prevent overfitting.
2. Initialize latent matrices:
   - Randomly initialize the **User Latent Matrix** ($U$) and **Item Latent Matrix** ($V$) with appropriate dimensions.
3. Define the loss function:
   - Mean Squared Error (MSE) between observed and predicted ratings.
   - Add regularization terms to penalize large values in $U$ and $V$.
4. Implement the optimization algorithm:
   - **Stochastic Gradient Descent (SGD)**:
     - Update $U$ and $V$ iteratively based on observed ratings.
   - **Alternating Least Squares (ALS)** (optional):
     - Alternate between optimizing $U$ and $V$.
5. Train the model:
   - Iterate over the data for a fixed number of epochs.
   - Compute and track the loss at each epoch to monitor convergence.

---

## Step 5: Make Predictions
1. Compute the **Predicted Ratings Matrix**:
   - Use the dot product of the trained matrices $U$ and $V^T$ to predict ratings for all user-item pairs.
2. Convert the predicted ratings matrix into a user-friendly format (e.g., Pandas DataFrame).

---

## Step 6: Evaluate the Model
1. Define evaluation metrics:
   - **Precision@K**: Fraction of top-$K$ recommendations that are relevant.
   - **Recall@K**: Fraction of relevant items included in the top-$K$ recommendations.
2. Split the dataset into training and test sets:
   - Use the training set to build the model.
   - Use the test set to evaluate the recommendations.
3. Compute the evaluation metrics:
   - Compare the predicted ratings with the actual ratings in the test set.
   - Calculate Precision@K and Recall@K to assess the quality of recommendations.

---

## Step 7: Conclusion and Next Steps
1. Summarize the performance of the Matrix Factorization model.
2. Discuss the limitations:
   - Cold-start problems for new users or items.
   - Sensitivity to hyperparameters (e.g., $k$, $\lambda$, $\eta$).
3. Suggest potential improvements:
   - Incorporate user/item biases into the model.
   - Use implicit feedback for recommendations.
   - Experiment with alternative optimization methods (e.g., ALS).
4. Provide an outlook to more advanced methods (e.g., deep learning-based collaborative filtering).

---

# Step 2: Import Libraries and Load Dataset
1. Import necessary libraries for data manipulation and matrix operations (e.g., Pandas, NumPy, etc.).
2. Load the dataset (e.g., MovieLens) containing user-item interactions (ratings).
3. Perform initial inspection of the dataset to understand its structure (e.g., columns for `userId`, `movieId`, `rating`, and potentially `timestamp`).

## Splitting the Data: Purpose and Approach

In a recommendation system, splitting the data into **training**, **validation**, and **test** sets is essential for evaluating the model's ability to generalize to unseen data. This ensures that the model is not simply memorizing historical interactions but is learning patterns that can predict future user-item interactions effectively.

---

### **Why Split the Data?**

1. **Generalization**:
   - The training set is used to learn the latent factors for users and items.
   - The validation and test sets simulate future interactions, allowing us to evaluate the model's ability to generalize beyond the data it was trained on.

2. **Avoiding Data Leakage**:
   - By splitting chronologically, we ensure that the model only uses past interactions to predict future ones. This avoids "peeking" into future data during training.

3. **Model Evaluation**:
   - Splitting the data provides a robust framework to measure the model's performance on:
     - **Validation Set**: Used to tune hyperparameters (e.g., learning rate, regularization).
     - **Test Set**: Provides an unbiased estimate of the model’s real-world performance.

---

### **Our Splitting Approach**

1. **Temporal Splitting**:
   - The dataset is split chronologically based on the `timestamp` of user interactions.
   - This ensures that:
     - Older interactions are in the training set.
     - Newer interactions are in the validation and test sets.

2. **User Presence**:
   - Every user is represented in the training set to ensure the model can generate predictions for all users in the validation and test sets.
   - Interactions for each user are split so that:
     - Early interactions go to the training set.
     - Middle interactions go to the validation set.
     - Recent interactions go to the test set.

---

### **Advanced Techniques**

In real-world systems, new users, items, or updated ratings often appear dynamically. While these scenarios are not directly addressed in this notebook, they can be handled with advanced techniques such as:
- **Online Updates** for user and item latent factors.
- **Pretrained Embeddings** for cold-start scenarios.
- **Hybrid Models** that combine collaborative filtering with content-based or neural approaches.

These methods allow recommendation systems to adapt dynamically to changes in data without full retraining.

---

By splitting the data appropriately and understanding its temporal structure, we ensure a realistic evaluation of our model's predictive capabilities while laying the foundation for extending it to more advanced dynamic scenarios.


In [17]:
import pandas as pd

# Load the dataset
ratings = pd.read_csv('data/ml-latest-small/ratings.csv')

# Sort the dataset by userId and timestamp
ratings = ratings.sort_values(by=['userId', 'timestamp'])

# Create empty lists to store the split data
train_data = []
val_data = []
test_data = []

# Define split ratios
train_ratio = 0.7  # 70% for training
val_ratio = 0.15  # 15% for validation
test_ratio = 0.15  # 15% for testing

# Split the data for each user
for user, group in ratings.groupby('userId'):
    n = len(group)
    
    # Ensure at least one interaction per set
    if n < 3:
        # If the user has fewer than 3 interactions, assign all to training
        train_data.append(group)
    else:
        train_end = int(train_ratio * n)
        val_end = int((train_ratio + val_ratio) * n)
        
        # Assign splits
        train_data.append(group.iloc[:train_end])  # Early interactions for training
        val_data.append(group.iloc[train_end:val_end])  # Middle interactions for validation
        test_data.append(group.iloc[val_end:])  # Recent interactions for testing

# Combine the split data into separate DataFrames
train_data = pd.concat(train_data)
val_data = pd.concat(val_data)
test_data = pd.concat(test_data)

# Output the sizes of the splits
print(f"Training set: {len(train_data)} interactions")
print(f"Validation set: {len(val_data)} interactions")
print(f"Test set: {len(test_data)} interactions")


Training set: 70312 interactions
Validation set: 15102 interactions
Test set: 15422 interactions


# Step 3: Data Preprocessing
Because we potentially work with large datasets (not with small practicing sets like ml-latest-small) we use **scipy** for using efficient data structures for sparse matrices instead of pandas.


1. Prepare the **User-Item Interaction Matrix**:
   - Create a pivot table with users as rows, items as columns, and ratings as values.
2. Handle missing values:
   - Fill missing entries with `0` (explicit feedback) or `NaN` (implicit feedback, to be handled differently in training).
3. Normalize the data:
   - Optionally subtract the mean rating of each user to center the data.

In [18]:
from scipy.sparse import csr_matrix
import numpy as np

# Function to create a sparse User-Item Interaction Matrix
def create_sparse_matrix(data, user_col, item_col, value_col):
    users = data[user_col].unique()
    items = data[item_col].unique()
    user_map = {user: i for i, user in enumerate(users)}
    item_map = {item: i for i, item in enumerate(items)}
    
    # Map users and items to indices
    rows = data[user_col].map(user_map)
    cols = data[item_col].map(item_map)
    values = data[value_col]

    # Create a sparse matrix (compressed sparse row matrix)
    sparse_matrix = csr_matrix((values, (rows, cols)), shape=(len(users), len(items)))
    return sparse_matrix, user_map, item_map

# Normalize the sparse matrix by subtracting mean ratings per user
def normalize_sparse_matrix(sparse_matrix):
    # Compute mean rating per user
    user_means = np.array(sparse_matrix.mean(axis=1)).flatten()
    
    # Subtract mean from non-zero entries
    row_indices, col_indices = sparse_matrix.nonzero()
    normalized_data = sparse_matrix.copy()
    normalized_data.data -= user_means[row_indices]
    
    return normalized_data, user_means

# Step 1: Prepare Sparse Matrices for Train, Validation, and Test Sets
# Train
train_sparse, train_user_map, train_item_map = create_sparse_matrix(
    train_data, user_col='userId', item_col='movieId', value_col='rating'
)
train_normalized, train_user_means = normalize_sparse_matrix(train_sparse)

# Validation
val_sparse, val_user_map, val_item_map = create_sparse_matrix(
    val_data, user_col='userId', item_col='movieId', value_col='rating'
)
val_normalized, val_user_means = normalize_sparse_matrix(val_sparse)

# Test
test_sparse, test_user_map, test_item_map = create_sparse_matrix(
    test_data, user_col='userId', item_col='movieId', value_col='rating'
)
test_normalized, test_user_means = normalize_sparse_matrix(test_sparse)

# Step 2: Print Summary of the Matrices
print("Sparse User-Item Interaction Matrices created and normalized.")
print(f"Train Matrix Shape: {train_normalized.shape}, Non-Zero: {train_sparse.nnz}")
print(f"Validation Matrix Shape: {val_normalized.shape}, Non-Zero: {val_sparse.nnz}")
print(f"Test Matrix Shape: {test_normalized.shape}, Non-Zero: {test_sparse.nnz}")

Sparse User-Item Interaction Matrices created and normalized.
Train Matrix Shape: (610, 7525), Non-Zero: 70312
Validation Matrix Shape: (610, 5129), Non-Zero: 15102
Test Matrix Shape: (610, 5556), Non-Zero: 15422


# Step 4: Model Implementation (Matrix Factorization)
1. Define the model parameters:
   - Number of latent features ($k$).
   - Learning rate ($\eta$) for optimization.
   - Regularization parameter ($\lambda$) to prevent overfitting.
2. Initialize latent matrices:
   - Randomly initialize the **User Latent Matrix** ($U$) and **Item Latent Matrix** ($V$) with appropriate dimensions.
3. Define the loss function:
   - Mean Squared Error (MSE) between observed and predicted ratings.
   - Add regularization terms to penalize large values in $U$ and $V$.
4. Implement the optimization algorithm:
   - **Stochastic Gradient Descent (SGD)**:
     - Update $U$ and $V$ iteratively based on observed ratings.
   - **Alternating Least Squares (ALS)** (optional):
     - Alternate between optimizing $U$ and $V$.
5. Train the model:
   - Iterate over the data for a fixed number of epochs.
   - Compute and track the loss at each epoch to monitor convergence.

In [26]:
# we found a setting for hyperparameters that works well for this dataset
# Best Parameters Found:
# Learning Rate: 0.004099151484173524
# Regularization: 0.00010453688227089353
# Latent Factors: 27
# Best Validation Loss: 3.0988

# Define model parameters
num_users, num_items = train_normalized.shape
k = 27  # Number of latent features
learning_rate = 0.004099151484173524
lambda_reg = 0.00010453688227089353
num_epochs = 50

# Initialize latent factor matrices
np.random.seed(42)
U = np.random.normal(scale=1./k, size=(num_users, k))
V = np.random.normal(scale=1./k, size=(num_items, k))

# Define the loss function
def compute_mse_loss(sparse_matrix, U, V, lambda_reg):
    row_indices, col_indices = sparse_matrix.nonzero()
    predictions = np.sum(U[row_indices] * V[col_indices], axis=1)
    errors = sparse_matrix.data - predictions
    mse = np.mean(errors ** 2)
    regularization = lambda_reg * (np.sum(U**2) + np.sum(V**2))
    return mse + regularization

# Train the model using Stochastic Gradient Descent (SGD)
def train_mf_sgd(train_matrix, val_matrix, U, V, num_epochs, learning_rate, lambda_reg, patience=5):
    train_row_indices, train_col_indices = train_matrix.nonzero()
    best_val_loss = float('inf')
    best_epoch = 0
    patience_counter = 0
    best_U, best_V = U.copy(), V.copy()

    for epoch in range(num_epochs):
        # Training loop
        for idx in range(len(train_row_indices)):
            user = train_row_indices[idx]
            item = train_col_indices[idx]
            rating = train_matrix.data[idx]

            # Compute prediction and error
            prediction = np.dot(U[user], V[item])
            error = rating - prediction

            # Update user and item latent factors
            U[user] += learning_rate * (error * V[item] - lambda_reg * U[user])
            V[item] += learning_rate * (error * U[user] - lambda_reg * V[item])

            # Clip gradients to prevent overflow
            U[user] = np.clip(U[user], -10, 10)
            V[item] = np.clip(V[item], -10, 10)

        # Compute losses after each epoch
        train_loss = compute_mse_loss(train_matrix, U, V, lambda_reg)
        val_loss = compute_mse_loss(val_matrix, U, V, lambda_reg)
        print(f"Epoch {epoch + 1}/{num_epochs}, Training Loss: {train_loss:.4f}, Validation Loss: {val_loss:.4f}")

        # Early stopping logic
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            best_epoch = epoch
            patience_counter = 0
            # Save the best U and V
            best_U, best_V = U.copy(), V.copy()
        else:
            patience_counter += 1
            if patience_counter >= patience:
                print(f"Early stopping triggered at epoch {epoch + 1}. Best Validation Loss: {best_val_loss:.4f}")
                break

    # Return the best U and V, along with the best losses
    return best_U, best_V, train_loss, best_val_loss, best_epoch

# Train the model with the new train and validation sets
U, V, _, _, _ = train_mf_sgd(train_normalized, val_normalized, U, V, num_epochs, learning_rate, lambda_reg)

print("Matrix Factorization model trained successfully.")

# Evaluate on the test set
test_loss = compute_mse_loss(test_normalized, U, V, lambda_reg)
print(f"Test Loss: {test_loss:.4f}")


Epoch 1/50, Training Loss: 12.2973, Validation Loss: 12.6040
Epoch 2/50, Training Loss: 12.0734, Validation Loss: 12.5131
Epoch 3/50, Training Loss: 9.3553, Validation Loss: 11.2151
Epoch 4/50, Training Loss: 5.3561, Validation Loss: 8.3608
Epoch 5/50, Training Loss: 3.7644, Validation Loss: 6.5978
Epoch 6/50, Training Loss: 3.0361, Validation Loss: 5.5776
Epoch 7/50, Training Loss: 2.6448, Validation Loss: 4.9306
Epoch 8/50, Training Loss: 2.4071, Validation Loss: 4.4885
Epoch 9/50, Training Loss: 2.2508, Validation Loss: 4.1695
Epoch 10/50, Training Loss: 2.1424, Validation Loss: 3.9302
Epoch 11/50, Training Loss: 2.0645, Validation Loss: 3.7455
Epoch 12/50, Training Loss: 2.0071, Validation Loss: 3.5998
Epoch 13/50, Training Loss: 1.9639, Validation Loss: 3.4834
Epoch 14/50, Training Loss: 1.9309, Validation Loss: 3.3893
Epoch 15/50, Training Loss: 1.9055, Validation Loss: 3.3129
Epoch 16/50, Training Loss: 1.8857, Validation Loss: 3.2507
Epoch 17/50, Training Loss: 1.8703, Validati

## Parameter Search (Grid Search)
When you try the basic training loop you may notice that it is pure luck to get a good loss. The loss depends (of course) heavily on the hyperparameters and where you start.

So it would be advisable to use tools like Grid Search or better Optuna to automate the hyperparameter search.

In [24]:
from itertools import product
import numpy as np

# Define the grid of hyperparameters
learning_rates = [0.001, 0.01, 0.1]
regularization_params = [0.01, 0.1, 1.0]
latent_factors = [10, 20, 50]

# Generate all combinations of hyperparameters
param_grid = list(product(learning_rates, regularization_params, latent_factors))

# Function to evaluate a model
def evaluate_model(U, V, test_matrix):
    # Compute predictions
    predictions = np.dot(U, V.T)
    
    # Compute RMSE for the test set
    row_indices, col_indices = test_matrix.nonzero()
    errors = [
        (test_matrix[row, col] - predictions[row, col])**2
        for row, col in zip(row_indices, col_indices)
    ]
    return np.sqrt(np.mean(errors))

# Perform Grid Search
best_params = None
best_val_loss = float('inf')
results = []

for learning_rate, lambda_reg, k in param_grid:
    print(f"Testing parameters: learning_rate={learning_rate}, lambda_reg={lambda_reg}, k={k}")
    
    # Initialize latent factor matrices
    U = np.random.normal(scale=1./k, size=(train_normalized.shape[0], k))
    V = np.random.normal(scale=1./k, size=(train_normalized.shape[1], k))
    
    # Train the model (using the updated function with early stopping)
    U, V, train_loss, val_loss, best_epoch = train_mf_sgd(
        train_matrix=train_normalized,
        val_matrix=val_normalized,
        U=U,
        V=V,
        num_epochs=20,  # Maximum epochs
        learning_rate=learning_rate,
        lambda_reg=lambda_reg,
        patience=5  # Early stopping patience
    )
    
    # Store results
    results.append({
        'learning_rate': learning_rate,
        'lambda_reg': lambda_reg,
        'k': k,
        'train_loss': train_loss,
        'val_loss': val_loss,
        'best_epoch': best_epoch
    })
    
    # Update best parameters if validation loss improves
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        best_params = (learning_rate, lambda_reg, k)
        print(f"New best params: {best_params} with Validation Loss: {best_val_loss:.4f}")

# Print the best parameters
print("\nBest Parameters Found:")
print(f"Learning Rate: {best_params[0]}, Regularization: {best_params[1]}, Latent Factors: {best_params[2]}")
print(f"Best Validation Loss: {best_val_loss:.4f}")

# Optional: Convert results to DataFrame for analysis
results_df = pd.DataFrame(results)
print("\nGrid Search Results:")
print(results_df)

Testing parameters: learning_rate=0.001, lambda_reg=0.01, k=10
Epoch 1/20, Training Loss: 20.3762, Validation Loss: 20.6772
Epoch 2/20, Training Loss: 20.3743, Validation Loss: 20.6797
Epoch 3/20, Training Loss: 20.3789, Validation Loss: 20.6893
Epoch 4/20, Training Loss: 20.3908, Validation Loss: 20.7075
Epoch 5/20, Training Loss: 20.4118, Validation Loss: 20.7378
Epoch 6/20, Training Loss: 20.4464, Validation Loss: 20.7876
Early stopping triggered at epoch 6. Best Validation Loss: 20.6772
New best params: (0.001, 0.01, 10) with Validation Loss: 20.6772
Testing parameters: learning_rate=0.001, lambda_reg=0.01, k=20
Epoch 1/20, Training Loss: 16.3513, Validation Loss: 16.6480
Epoch 2/20, Training Loss: 16.3517, Validation Loss: 16.6507
Epoch 3/20, Training Loss: 16.3554, Validation Loss: 16.6571
Epoch 4/20, Training Loss: 16.3629, Validation Loss: 16.6681
Epoch 5/20, Training Loss: 16.3755, Validation Loss: 16.6859
Epoch 6/20, Training Loss: 16.3960, Validation Loss: 16.7151
Early stop

KeyboardInterrupt: 

## Parameter Search (Optuna)
Grid Search takes a lot of time and is tedious. Let's try Optuna for hyperparameter search. And let's add an early stopping test to the training.

In [25]:
import optuna
import numpy as np

# Define the training function with Optuna integration
def objective(trial):
    # Suggest hyperparameters
    learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1e-2)
    lambda_reg = trial.suggest_loguniform('lambda_reg', 1e-4, 1e-1)
    k = trial.suggest_int('k', 10, 50)
    
    # Initialize latent factor matrices
    U = np.random.normal(scale=1./k, size=(train_normalized.shape[0], k))
    V = np.random.normal(scale=1./k, size=(train_normalized.shape[1], k))
    
    # Train the model with early stopping
    num_epochs = 20
    patience = 5
    U, V, _, val_loss, _ = train_mf_sgd(
        train_matrix=train_normalized,
        val_matrix=val_normalized,
        U=U,
        V=V,
        num_epochs=num_epochs,
        learning_rate=learning_rate,
        lambda_reg=lambda_reg,
        patience=patience
    )
    
    # Return the validation loss (Optuna minimizes this value)
    return val_loss

# Create an Optuna study
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)

# Print the best hyperparameters and validation loss
print("\nBest Parameters Found:")
print(f"Learning Rate: {study.best_params['learning_rate']}")
print(f"Regularization: {study.best_params['lambda_reg']}")
print(f"Latent Factors: {study.best_params['k']}")
print(f"Best Validation Loss: {study.best_value:.4f}")

# Optional: Analyze study results
import pandas as pd
results_df = pd.DataFrame([
    {'trial': t.number, **t.params, 'val_loss': t.value}
    for t in study.trials
])
print("\nOptuna Study Results:")
print(results_df)


[I 2024-12-15 16:06:27,527] A new study created in memory with name: no-name-16dfab37-a54f-4256-a4a1-356b6f574dff
  learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1e-2)
  lambda_reg = trial.suggest_loguniform('lambda_reg', 1e-4, 1e-1)


Epoch 1/20, Training Loss: 45.1992, Validation Loss: 45.5459
Epoch 2/20, Training Loss: 117.0074, Validation Loss: 118.8239
Epoch 3/20, Training Loss: 348.2541, Validation Loss: 351.4071
Epoch 4/20, Training Loss: 509.7894, Validation Loss: 512.5627
Epoch 5/20, Training Loss: 622.0148, Validation Loss: 624.4169


[I 2024-12-15 16:06:34,109] Trial 0 finished with value: 45.545910732771986 and parameters: {'learning_rate': 0.005958998949583688, 'lambda_reg': 0.09643064217921256, 'k': 25}. Best is trial 0 with value: 45.545910732771986.


Epoch 6/20, Training Loss: 707.8056, Validation Loss: 709.9302
Early stopping triggered at epoch 6. Best Validation Loss: 45.5459
Epoch 1/20, Training Loss: 13.6899, Validation Loss: 13.9876
Epoch 2/20, Training Loss: 13.6898, Validation Loss: 13.9884
Epoch 3/20, Training Loss: 13.6903, Validation Loss: 13.9901
Epoch 4/20, Training Loss: 13.6914, Validation Loss: 13.9926
Epoch 5/20, Training Loss: 13.6932, Validation Loss: 13.9963


[I 2024-12-15 16:06:40,704] Trial 1 finished with value: 13.987573344188096 and parameters: {'learning_rate': 0.0007425281450583539, 'lambda_reg': 0.006592702653314613, 'k': 38}. Best is trial 1 with value: 13.987573344188096.


Epoch 6/20, Training Loss: 13.6959, Validation Loss: 14.0016
Early stopping triggered at epoch 6. Best Validation Loss: 13.9876
Epoch 1/20, Training Loss: 12.3067, Validation Loss: 12.6077
Epoch 2/20, Training Loss: 12.2926, Validation Loss: 12.6041
Epoch 3/20, Training Loss: 12.2114, Validation Loss: 12.5744
Epoch 4/20, Training Loss: 11.6890, Validation Loss: 12.3611
Epoch 5/20, Training Loss: 9.5179, Validation Loss: 11.3615
Epoch 6/20, Training Loss: 6.5708, Validation Loss: 9.5387
Epoch 7/20, Training Loss: 4.8926, Validation Loss: 8.0240
Epoch 8/20, Training Loss: 3.9347, Validation Loss: 6.9570
Epoch 9/20, Training Loss: 3.3537, Validation Loss: 6.1955
Epoch 10/20, Training Loss: 2.9781, Validation Loss: 5.6347
Epoch 11/20, Training Loss: 2.7204, Validation Loss: 5.2081
Epoch 12/20, Training Loss: 2.5353, Validation Loss: 4.8744
Epoch 13/20, Training Loss: 2.3978, Validation Loss: 4.6075
Epoch 14/20, Training Loss: 2.2928, Validation Loss: 4.3899
Epoch 15/20, Training Loss: 2.21

[I 2024-12-15 16:07:02,413] Trial 2 finished with value: 3.646738813706412 and parameters: {'learning_rate': 0.0025439998636188027, 'lambda_reg': 0.00010980672787850688, 'k': 28}. Best is trial 2 with value: 3.646738813706412.


Epoch 20/20, Training Loss: 1.9939, Validation Loss: 3.6467
Epoch 1/20, Training Loss: 12.4245, Validation Loss: 12.7212
Epoch 2/20, Training Loss: 12.4239, Validation Loss: 12.7213
Epoch 3/20, Training Loss: 12.4232, Validation Loss: 12.7213
Epoch 4/20, Training Loss: 12.4226, Validation Loss: 12.7214
Epoch 5/20, Training Loss: 12.4217, Validation Loss: 12.7215


[I 2024-12-15 16:07:09,144] Trial 3 finished with value: 12.721222337493453 and parameters: {'learning_rate': 0.0007043759616598867, 'lambda_reg': 0.0008942074345371845, 'k': 50}. Best is trial 2 with value: 3.646738813706412.


Epoch 6/20, Training Loss: 12.4206, Validation Loss: 12.7216
Early stopping triggered at epoch 6. Best Validation Loss: 12.7212
Epoch 1/20, Training Loss: 12.3142, Validation Loss: 12.6270
Epoch 2/20, Training Loss: 11.2440, Validation Loss: 12.1792
Epoch 3/20, Training Loss: 6.3757, Validation Loss: 9.2536
Epoch 4/20, Training Loss: 4.6840, Validation Loss: 7.3331
Epoch 5/20, Training Loss: 4.1949, Validation Loss: 6.4897
Epoch 6/20, Training Loss: 4.0377, Validation Loss: 6.0592
Epoch 7/20, Training Loss: 3.9970, Validation Loss: 5.8190
Epoch 8/20, Training Loss: 4.0069, Validation Loss: 5.6794
Epoch 9/20, Training Loss: 4.0415, Validation Loss: 5.5981
Epoch 10/20, Training Loss: 4.0884, Validation Loss: 5.5531
Epoch 11/20, Training Loss: 4.1412, Validation Loss: 5.5320
Epoch 12/20, Training Loss: 4.1964, Validation Loss: 5.5276
Epoch 13/20, Training Loss: 4.2520, Validation Loss: 5.5356
Epoch 14/20, Training Loss: 4.3069, Validation Loss: 5.5529
Epoch 15/20, Training Loss: 4.3607, V

[I 2024-12-15 16:07:27,888] Trial 4 finished with value: 5.527625878980612 and parameters: {'learning_rate': 0.005404971391194554, 'lambda_reg': 0.00033182427843668906, 'k': 47}. Best is trial 2 with value: 3.646738813706412.


Epoch 17/20, Training Loss: 4.4636, Validation Loss: 5.6435
Early stopping triggered at epoch 17. Best Validation Loss: 5.5276
Epoch 1/20, Training Loss: 14.1096, Validation Loss: 14.4057
Epoch 2/20, Training Loss: 14.1095, Validation Loss: 14.4057
Epoch 3/20, Training Loss: 14.1094, Validation Loss: 14.4056
Epoch 4/20, Training Loss: 14.1093, Validation Loss: 14.4056
Epoch 5/20, Training Loss: 14.1091, Validation Loss: 14.4056
Epoch 6/20, Training Loss: 14.1090, Validation Loss: 14.4056
Epoch 7/20, Training Loss: 14.1089, Validation Loss: 14.4056
Epoch 8/20, Training Loss: 14.1088, Validation Loss: 14.4055
Epoch 9/20, Training Loss: 14.1087, Validation Loss: 14.4055
Epoch 10/20, Training Loss: 14.1086, Validation Loss: 14.4055
Epoch 11/20, Training Loss: 14.1085, Validation Loss: 14.4055
Epoch 12/20, Training Loss: 14.1084, Validation Loss: 14.4055
Epoch 13/20, Training Loss: 14.1083, Validation Loss: 14.4056
Epoch 14/20, Training Loss: 14.1082, Validation Loss: 14.4056


[I 2024-12-15 16:07:43,898] Trial 5 finished with value: 14.405538331737848 and parameters: {'learning_rate': 4.967083922015576e-05, 'lambda_reg': 0.005194231445336169, 'k': 23}. Best is trial 2 with value: 3.646738813706412.


Epoch 15/20, Training Loss: 14.1081, Validation Loss: 14.4056
Early stopping triggered at epoch 15. Best Validation Loss: 14.4055
Epoch 1/20, Training Loss: 12.8859, Validation Loss: 13.1820
Epoch 2/20, Training Loss: 12.8858, Validation Loss: 13.1820
Epoch 3/20, Training Loss: 12.8857, Validation Loss: 13.1820
Epoch 4/20, Training Loss: 12.8855, Validation Loss: 13.1820
Epoch 5/20, Training Loss: 12.8854, Validation Loss: 13.1820
Epoch 6/20, Training Loss: 12.8852, Validation Loss: 13.1820
Epoch 7/20, Training Loss: 12.8851, Validation Loss: 13.1820
Epoch 8/20, Training Loss: 12.8849, Validation Loss: 13.1820
Epoch 9/20, Training Loss: 12.8848, Validation Loss: 13.1820
Epoch 10/20, Training Loss: 12.8847, Validation Loss: 13.1820
Epoch 11/20, Training Loss: 12.8845, Validation Loss: 13.1819
Epoch 12/20, Training Loss: 12.8844, Validation Loss: 13.1819
Epoch 13/20, Training Loss: 12.8843, Validation Loss: 13.1819
Epoch 14/20, Training Loss: 12.8841, Validation Loss: 13.1819
Epoch 15/20

[I 2024-12-15 16:08:04,745] Trial 6 finished with value: 13.181947038082702 and parameters: {'learning_rate': 4.5079875421490405e-05, 'lambda_reg': 0.0011227403130258822, 'k': 15}. Best is trial 2 with value: 3.646738813706412.


Epoch 19/20, Training Loss: 12.8834, Validation Loss: 13.1820
Early stopping triggered at epoch 19. Best Validation Loss: 13.1819
Epoch 1/20, Training Loss: 12.7391, Validation Loss: 13.0423
Epoch 2/20, Training Loss: 12.7248, Validation Loss: 13.0429
Epoch 3/20, Training Loss: 12.6270, Validation Loss: 13.0256
Epoch 4/20, Training Loss: 11.9797, Validation Loss: 12.8724
Epoch 5/20, Training Loss: 10.0046, Validation Loss: 12.3276
Epoch 6/20, Training Loss: 8.7261, Validation Loss: 11.8143
Epoch 7/20, Training Loss: 8.5072, Validation Loss: 11.6079
Epoch 8/20, Training Loss: 8.6781, Validation Loss: 11.6192
Epoch 9/20, Training Loss: 9.0098, Validation Loss: 11.7560
Epoch 10/20, Training Loss: 9.3935, Validation Loss: 11.9548
Epoch 11/20, Training Loss: 9.7857, Validation Loss: 12.1835


[I 2024-12-15 16:08:17,733] Trial 7 finished with value: 11.607918401822484 and parameters: {'learning_rate': 0.002666908268608185, 'lambda_reg': 0.0012607130924920953, 'k': 22}. Best is trial 2 with value: 3.646738813706412.


Epoch 12/20, Training Loss: 10.1693, Validation Loss: 12.4252
Early stopping triggered at epoch 12. Best Validation Loss: 11.6079
Epoch 1/20, Training Loss: 12.4107, Validation Loss: 12.7073
Epoch 2/20, Training Loss: 12.4104, Validation Loss: 12.7073
Epoch 3/20, Training Loss: 12.4101, Validation Loss: 12.7072
Epoch 4/20, Training Loss: 12.4099, Validation Loss: 12.7072
Epoch 5/20, Training Loss: 12.4096, Validation Loss: 12.7072
Epoch 6/20, Training Loss: 12.4093, Validation Loss: 12.7072
Epoch 7/20, Training Loss: 12.4090, Validation Loss: 12.7072
Epoch 8/20, Training Loss: 12.4088, Validation Loss: 12.7072
Epoch 9/20, Training Loss: 12.4084, Validation Loss: 12.7072
Epoch 10/20, Training Loss: 12.4081, Validation Loss: 12.7072
Epoch 11/20, Training Loss: 12.4077, Validation Loss: 12.7072
Epoch 12/20, Training Loss: 12.4074, Validation Loss: 12.7072
Epoch 13/20, Training Loss: 12.4069, Validation Loss: 12.7071
Epoch 14/20, Training Loss: 12.4064, Validation Loss: 12.7071
Epoch 15/20

[I 2024-12-15 16:08:39,870] Trial 8 finished with value: 12.706324404617176 and parameters: {'learning_rate': 0.00025021852706087327, 'lambda_reg': 0.0007418691194253386, 'k': 46}. Best is trial 2 with value: 3.646738813706412.


Epoch 20/20, Training Loss: 12.4013, Validation Loss: 12.7063
Epoch 1/20, Training Loss: 15.8736, Validation Loss: 16.1716
Epoch 2/20, Training Loss: 15.8731, Validation Loss: 16.1716
Epoch 3/20, Training Loss: 15.8727, Validation Loss: 16.1715
Epoch 4/20, Training Loss: 15.8723, Validation Loss: 16.1715
Epoch 5/20, Training Loss: 15.8719, Validation Loss: 16.1716
Epoch 6/20, Training Loss: 15.8716, Validation Loss: 16.1717
Epoch 7/20, Training Loss: 15.8712, Validation Loss: 16.1718


[I 2024-12-15 16:08:48,497] Trial 9 finished with value: 16.171538402931937 and parameters: {'learning_rate': 0.00010365130657710538, 'lambda_reg': 0.005297210190985668, 'k': 12}. Best is trial 2 with value: 3.646738813706412.


Epoch 8/20, Training Loss: 15.8710, Validation Loss: 16.1719
Early stopping triggered at epoch 8. Best Validation Loss: 16.1715
Epoch 1/20, Training Loss: 12.3088, Validation Loss: 12.6044
Epoch 2/20, Training Loss: 12.3088, Validation Loss: 12.6044
Epoch 3/20, Training Loss: 12.3088, Validation Loss: 12.6044
Epoch 4/20, Training Loss: 12.3088, Validation Loss: 12.6044
Epoch 5/20, Training Loss: 12.3088, Validation Loss: 12.6044
Epoch 6/20, Training Loss: 12.3088, Validation Loss: 12.6044
Epoch 7/20, Training Loss: 12.3087, Validation Loss: 12.6044
Epoch 8/20, Training Loss: 12.3087, Validation Loss: 12.6044
Epoch 9/20, Training Loss: 12.3087, Validation Loss: 12.6044
Epoch 10/20, Training Loss: 12.3087, Validation Loss: 12.6044
Epoch 11/20, Training Loss: 12.3087, Validation Loss: 12.6044
Epoch 12/20, Training Loss: 12.3087, Validation Loss: 12.6044
Epoch 13/20, Training Loss: 12.3087, Validation Loss: 12.6044
Epoch 14/20, Training Loss: 12.3086, Validation Loss: 12.6044
Epoch 15/20, 

[I 2024-12-15 16:09:10,377] Trial 10 finished with value: 12.604360148259428 and parameters: {'learning_rate': 1.057847323785517e-05, 'lambda_reg': 0.00012191505617007474, 'k': 34}. Best is trial 2 with value: 3.646738813706412.


Epoch 20/20, Training Loss: 12.3086, Validation Loss: 12.6044
Epoch 1/20, Training Loss: 12.1276, Validation Loss: 12.5302
Epoch 2/20, Training Loss: 7.2018, Validation Loss: 9.7258
Epoch 3/20, Training Loss: 3.9779, Validation Loss: 6.4168
Epoch 4/20, Training Loss: 3.0929, Validation Loss: 5.1477
Epoch 5/20, Training Loss: 2.7349, Validation Loss: 4.5147
Epoch 6/20, Training Loss: 2.5482, Validation Loss: 4.1434
Epoch 7/20, Training Loss: 2.4399, Validation Loss: 3.9042
Epoch 8/20, Training Loss: 2.3748, Validation Loss: 3.7419
Epoch 9/20, Training Loss: 2.3356, Validation Loss: 3.6287
Epoch 10/20, Training Loss: 2.3127, Validation Loss: 3.5496
Epoch 11/20, Training Loss: 2.3003, Validation Loss: 3.4956
Epoch 12/20, Training Loss: 2.2949, Validation Loss: 3.4608
Epoch 13/20, Training Loss: 2.2942, Validation Loss: 3.4411
Epoch 14/20, Training Loss: 2.2966, Validation Loss: 3.4331
Epoch 15/20, Training Loss: 2.3011, Validation Loss: 3.4341
Epoch 16/20, Training Loss: 2.3068, Validatio

[I 2024-12-15 16:09:31,078] Trial 11 finished with value: 3.433062764286082 and parameters: {'learning_rate': 0.006966260272668645, 'lambda_reg': 0.00014130419810859835, 'k': 39}. Best is trial 11 with value: 3.433062764286082.


Epoch 19/20, Training Loss: 2.3260, Validation Loss: 3.4881
Early stopping triggered at epoch 19. Best Validation Loss: 3.4331
Epoch 1/20, Training Loss: 12.3007, Validation Loss: 12.5991
Epoch 2/20, Training Loss: 12.2968, Validation Loss: 12.5984
Epoch 3/20, Training Loss: 12.2864, Validation Loss: 12.5952
Epoch 4/20, Training Loss: 12.2519, Validation Loss: 12.5824
Epoch 5/20, Training Loss: 12.1273, Validation Loss: 12.5330
Epoch 6/20, Training Loss: 11.6933, Validation Loss: 12.3541
Epoch 7/20, Training Loss: 10.4659, Validation Loss: 11.8152
Epoch 8/20, Training Loss: 8.3686, Validation Loss: 10.7451
Epoch 9/20, Training Loss: 6.5120, Validation Loss: 9.5025
Epoch 10/20, Training Loss: 5.2958, Validation Loss: 8.4463
Epoch 11/20, Training Loss: 4.4714, Validation Loss: 7.6067
Epoch 12/20, Training Loss: 3.8957, Validation Loss: 6.9417
Epoch 13/20, Training Loss: 3.4841, Validation Loss: 6.4104
Epoch 14/20, Training Loss: 3.1813, Validation Loss: 5.9801
Epoch 15/20, Training Loss:

[I 2024-12-15 16:09:52,683] Trial 12 finished with value: 4.527515783666918 and parameters: {'learning_rate': 0.0017225450345013371, 'lambda_reg': 0.00011237970343571529, 'k': 39}. Best is trial 11 with value: 3.433062764286082.


Epoch 20/20, Training Loss: 2.3511, Validation Loss: 4.5275
Epoch 1/20, Training Loss: 9.8949, Validation Loss: 11.4025
Epoch 2/20, Training Loss: 5.0620, Validation Loss: 7.1301
Epoch 3/20, Training Loss: 4.1272, Validation Loss: 5.8446
Epoch 4/20, Training Loss: 3.8860, Validation Loss: 5.3772
Epoch 5/20, Training Loss: 3.8042, Validation Loss: 5.1621
Epoch 6/20, Training Loss: 3.7898, Validation Loss: 5.0564
Epoch 7/20, Training Loss: 3.8090, Validation Loss: 5.0098
Epoch 8/20, Training Loss: 3.8454, Validation Loss: 5.0009
Epoch 9/20, Training Loss: 3.8908, Validation Loss: 5.0189
Epoch 10/20, Training Loss: 3.9409, Validation Loss: 5.0570
Epoch 11/20, Training Loss: 3.9934, Validation Loss: 5.1099
Epoch 12/20, Training Loss: 4.0464, Validation Loss: 5.1727


[I 2024-12-15 16:10:06,850] Trial 13 finished with value: 5.00087514886319 and parameters: {'learning_rate': 0.009610512779414496, 'lambda_reg': 0.00027274530142059965, 'k': 30}. Best is trial 11 with value: 3.433062764286082.


Epoch 13/20, Training Loss: 4.0986, Validation Loss: 5.2409
Early stopping triggered at epoch 13. Best Validation Loss: 5.0009
Epoch 1/20, Training Loss: 20.0683, Validation Loss: 20.3667
Epoch 2/20, Training Loss: 20.0888, Validation Loss: 20.3906
Epoch 3/20, Training Loss: 20.1742, Validation Loss: 20.4838
Epoch 4/20, Training Loss: 20.4766, Validation Loss: 20.8134
Epoch 5/20, Training Loss: 21.6486, Validation Loss: 22.0919


[I 2024-12-15 16:10:13,477] Trial 14 finished with value: 20.366681294810427 and parameters: {'learning_rate': 0.0019623519277194214, 'lambda_reg': 0.038177877812858675, 'k': 40}. Best is trial 11 with value: 3.433062764286082.


Epoch 6/20, Training Loss: 26.1965, Validation Loss: 27.0326
Early stopping triggered at epoch 6. Best Validation Loss: 20.3667
Epoch 1/20, Training Loss: 12.3562, Validation Loss: 12.6539
Epoch 2/20, Training Loss: 12.3547, Validation Loss: 12.6537
Epoch 3/20, Training Loss: 12.3529, Validation Loss: 12.6535
Epoch 4/20, Training Loss: 12.3504, Validation Loss: 12.6530
Epoch 5/20, Training Loss: 12.3466, Validation Loss: 12.6521
Epoch 6/20, Training Loss: 12.3406, Validation Loss: 12.6503
Epoch 7/20, Training Loss: 12.3302, Validation Loss: 12.6469
Epoch 8/20, Training Loss: 12.3120, Validation Loss: 12.6405
Epoch 9/20, Training Loss: 12.2793, Validation Loss: 12.6285
Epoch 10/20, Training Loss: 12.2204, Validation Loss: 12.6065
Epoch 11/20, Training Loss: 12.1147, Validation Loss: 12.5662
Epoch 12/20, Training Loss: 11.9287, Validation Loss: 12.4940
Epoch 13/20, Training Loss: 11.6128, Validation Loss: 12.3692
Epoch 14/20, Training Loss: 11.1098, Validation Loss: 12.1644
Epoch 15/20, 

[I 2024-12-15 16:10:34,941] Trial 15 finished with value: 9.496581801473422 and parameters: {'learning_rate': 0.0007923863283841693, 'lambda_reg': 0.0002878072774885186, 'k': 30}. Best is trial 11 with value: 3.433062764286082.


Epoch 20/20, Training Loss: 6.3943, Validation Loss: 9.4966
Epoch 1/20, Training Loss: 18.9234, Validation Loss: 19.2259
Epoch 2/20, Training Loss: 18.9869, Validation Loss: 19.3033
Epoch 3/20, Training Loss: 19.2886, Validation Loss: 19.6770
Epoch 4/20, Training Loss: 21.1940, Validation Loss: 22.0707
Epoch 5/20, Training Loss: 30.1299, Validation Loss: 32.5249


[I 2024-12-15 16:10:41,487] Trial 16 finished with value: 19.225856916804823 and parameters: {'learning_rate': 0.002873770363543256, 'lambda_reg': 0.014694262692562864, 'k': 18}. Best is trial 11 with value: 3.433062764286082.


Epoch 6/20, Training Loss: 46.0447, Validation Loss: 49.1897
Early stopping triggered at epoch 6. Best Validation Loss: 19.2259
Epoch 1/20, Training Loss: 10.0091, Validation Loss: 11.4017
Epoch 2/20, Training Loss: 4.3482, Validation Loss: 6.4250
Epoch 3/20, Training Loss: 3.0262, Validation Loss: 4.7575
Epoch 4/20, Training Loss: 2.5508, Validation Loss: 4.0613
Epoch 5/20, Training Loss: 2.3060, Validation Loss: 3.6860
Epoch 6/20, Training Loss: 2.1731, Validation Loss: 3.4596
Epoch 7/20, Training Loss: 2.1011, Validation Loss: 3.3175
Epoch 8/20, Training Loss: 2.0624, Validation Loss: 3.2294
Epoch 9/20, Training Loss: 2.0424, Validation Loss: 3.1782
Epoch 10/20, Training Loss: 2.0330, Validation Loss: 3.1535
Epoch 11/20, Training Loss: 2.0293, Validation Loss: 3.1477
Epoch 12/20, Training Loss: 2.0286, Validation Loss: 3.1549
Epoch 13/20, Training Loss: 2.0291, Validation Loss: 3.1704
Epoch 14/20, Training Loss: 2.0299, Validation Loss: 3.1906
Epoch 15/20, Training Loss: 2.0306, Val

[I 2024-12-15 16:10:58,664] Trial 17 finished with value: 3.1477262233085597 and parameters: {'learning_rate': 0.00964212381604277, 'lambda_reg': 0.00011623361275907739, 'k': 35}. Best is trial 17 with value: 3.1477262233085597.


Epoch 16/20, Training Loss: 2.0314, Validation Loss: 3.2376
Early stopping triggered at epoch 16. Best Validation Loss: 3.1477
Epoch 1/20, Training Loss: 12.2471, Validation Loss: 13.5198
Epoch 2/20, Training Loss: 13.4880, Validation Loss: 15.6373
Epoch 3/20, Training Loss: 16.0447, Validation Loss: 17.8135
Epoch 4/20, Training Loss: 18.1208, Validation Loss: 19.6502
Epoch 5/20, Training Loss: 19.7327, Validation Loss: 21.1206


[I 2024-12-15 16:11:05,207] Trial 18 finished with value: 13.51976300214832 and parameters: {'learning_rate': 0.009558431031003633, 'lambda_reg': 0.0019932077202119915, 'k': 35}. Best is trial 17 with value: 3.1477262233085597.


Epoch 6/20, Training Loss: 21.0353, Validation Loss: 22.3244
Early stopping triggered at epoch 6. Best Validation Loss: 13.5198
Epoch 1/20, Training Loss: 12.3604, Validation Loss: 12.6569
Epoch 2/20, Training Loss: 12.3600, Validation Loss: 12.6568
Epoch 3/20, Training Loss: 12.3597, Validation Loss: 12.6568
Epoch 4/20, Training Loss: 12.3594, Validation Loss: 12.6568
Epoch 5/20, Training Loss: 12.3590, Validation Loss: 12.6568
Epoch 6/20, Training Loss: 12.3586, Validation Loss: 12.6568
Epoch 7/20, Training Loss: 12.3583, Validation Loss: 12.6567
Epoch 8/20, Training Loss: 12.3578, Validation Loss: 12.6567
Epoch 9/20, Training Loss: 12.3574, Validation Loss: 12.6567
Epoch 10/20, Training Loss: 12.3569, Validation Loss: 12.6566
Epoch 11/20, Training Loss: 12.3563, Validation Loss: 12.6565
Epoch 12/20, Training Loss: 12.3556, Validation Loss: 12.6564
Epoch 13/20, Training Loss: 12.3548, Validation Loss: 12.6563
Epoch 14/20, Training Loss: 12.3539, Validation Loss: 12.6561
Epoch 15/20, 

[I 2024-12-15 16:11:27,131] Trial 19 finished with value: 12.652866972573117 and parameters: {'learning_rate': 0.00030710566881568654, 'lambda_reg': 0.00043721398941386447, 'k': 44}. Best is trial 17 with value: 3.1477262233085597.


Epoch 20/20, Training Loss: 12.3422, Validation Loss: 12.6529
Epoch 1/20, Training Loss: 12.3012, Validation Loss: 12.6150
Epoch 2/20, Training Loss: 11.4919, Validation Loss: 12.2643
Epoch 3/20, Training Loss: 6.7528, Validation Loss: 9.5678
Epoch 4/20, Training Loss: 4.4057, Validation Loss: 7.2208
Epoch 5/20, Training Loss: 3.5761, Validation Loss: 6.0610
Epoch 6/20, Training Loss: 3.2126, Validation Loss: 5.4090
Epoch 7/20, Training Loss: 3.0266, Validation Loss: 5.0028
Epoch 8/20, Training Loss: 2.9238, Validation Loss: 4.7317
Epoch 9/20, Training Loss: 2.8660, Validation Loss: 4.5423
Epoch 10/20, Training Loss: 2.8348, Validation Loss: 4.4058
Epoch 11/20, Training Loss: 2.8204, Validation Loss: 4.3056
Epoch 12/20, Training Loss: 2.8169, Validation Loss: 4.2314
Epoch 13/20, Training Loss: 2.8205, Validation Loss: 4.1765
Epoch 14/20, Training Loss: 2.8286, Validation Loss: 4.1365
Epoch 15/20, Training Loss: 2.8398, Validation Loss: 4.1083
Epoch 16/20, Training Loss: 2.8528, Validat

[I 2024-12-15 16:11:49,217] Trial 20 finished with value: 4.074508815577245 and parameters: {'learning_rate': 0.004806387053894252, 'lambda_reg': 0.00019540284819336205, 'k': 34}. Best is trial 17 with value: 3.1477262233085597.


Epoch 20/20, Training Loss: 2.9121, Validation Loss: 4.0810
Epoch 1/20, Training Loss: 12.2975, Validation Loss: 12.6046
Epoch 2/20, Training Loss: 12.0788, Validation Loss: 12.5143
Epoch 3/20, Training Loss: 9.4000, Validation Loss: 11.2309
Epoch 4/20, Training Loss: 5.3699, Validation Loss: 8.3762
Epoch 5/20, Training Loss: 3.7670, Validation Loss: 6.6069
Epoch 6/20, Training Loss: 3.0344, Validation Loss: 5.5828
Epoch 7/20, Training Loss: 2.6405, Validation Loss: 4.9337
Epoch 8/20, Training Loss: 2.4008, Validation Loss: 4.4902
Epoch 9/20, Training Loss: 2.2428, Validation Loss: 4.1703
Epoch 10/20, Training Loss: 2.1334, Validation Loss: 3.9304
Epoch 11/20, Training Loss: 2.0552, Validation Loss: 3.7453
Epoch 12/20, Training Loss: 1.9981, Validation Loss: 3.5994
Epoch 13/20, Training Loss: 1.9556, Validation Loss: 3.4828
Epoch 14/20, Training Loss: 1.9235, Validation Loss: 3.3887
Epoch 15/20, Training Loss: 1.8990, Validation Loss: 3.3122
Epoch 16/20, Training Loss: 1.8800, Validati

[I 2024-12-15 16:12:10,738] Trial 21 finished with value: 3.0987913564505427 and parameters: {'learning_rate': 0.004099151484173524, 'lambda_reg': 0.00010453688227089353, 'k': 27}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 1.8372, Validation Loss: 3.0988
Epoch 1/20, Training Loss: 12.3689, Validation Loss: 12.6664
Epoch 2/20, Training Loss: 12.3673, Validation Loss: 12.6663
Epoch 3/20, Training Loss: 12.3647, Validation Loss: 12.6660
Epoch 4/20, Training Loss: 12.3596, Validation Loss: 12.6650
Epoch 5/20, Training Loss: 12.3475, Validation Loss: 12.6615
Epoch 6/20, Training Loss: 12.3167, Validation Loss: 12.6515
Epoch 7/20, Training Loss: 12.2369, Validation Loss: 12.6239
Epoch 8/20, Training Loss: 12.0335, Validation Loss: 12.5509
Epoch 9/20, Training Loss: 11.5516, Validation Loss: 12.3720
Epoch 10/20, Training Loss: 10.5933, Validation Loss: 11.9944
Epoch 11/20, Training Loss: 9.2126, Validation Loss: 11.3826
Epoch 12/20, Training Loss: 7.8928, Validation Loss: 10.6649
Epoch 13/20, Training Loss: 6.9378, Validation Loss: 9.9980
Epoch 14/20, Training Loss: 6.2749, Validation Loss: 9.4322
Epoch 15/20, Training Loss: 5.8038, Validation Loss: 8.9648
Epoch 16/20, Training Loss:

[I 2024-12-15 16:12:32,599] Trial 22 finished with value: 7.638617234012541 and parameters: {'learning_rate': 0.0012661850826462058, 'lambda_reg': 0.0004789542250192062, 'k': 43}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 4.8745, Validation Loss: 7.6386
Epoch 1/20, Training Loss: 12.3109, Validation Loss: 12.6149
Epoch 2/20, Training Loss: 12.1787, Validation Loss: 12.5612
Epoch 3/20, Training Loss: 10.2185, Validation Loss: 11.6651
Epoch 4/20, Training Loss: 5.9858, Validation Loss: 8.9496
Epoch 5/20, Training Loss: 4.2416, Validation Loss: 7.1121
Epoch 6/20, Training Loss: 3.5025, Validation Loss: 6.0874
Epoch 7/20, Training Loss: 3.1374, Validation Loss: 5.4603
Epoch 8/20, Training Loss: 2.9334, Validation Loss: 5.0458
Epoch 9/20, Training Loss: 2.8110, Validation Loss: 4.7565
Epoch 10/20, Training Loss: 2.7352, Validation Loss: 4.5463
Epoch 11/20, Training Loss: 2.6880, Validation Loss: 4.3895
Epoch 12/20, Training Loss: 2.6594, Validation Loss: 4.2701
Epoch 13/20, Training Loss: 2.6433, Validation Loss: 4.1782
Epoch 14/20, Training Loss: 2.6357, Validation Loss: 4.1069
Epoch 15/20, Training Loss: 2.6339, Validation Loss: 4.0517
Epoch 16/20, Training Loss: 2.6361, Validat

[I 2024-12-15 16:12:54,328] Trial 23 finished with value: 3.92536395049727 and parameters: {'learning_rate': 0.0040943212020290295, 'lambda_reg': 0.00017963339382078876, 'k': 36}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 2.6663, Validation Loss: 3.9254
Epoch 1/20, Training Loss: 9.4148, Validation Loss: 11.0440
Epoch 2/20, Training Loss: 4.4849, Validation Loss: 6.4494
Epoch 3/20, Training Loss: 3.3985, Validation Loss: 5.0553
Epoch 4/20, Training Loss: 3.0354, Validation Loss: 4.4936
Epoch 5/20, Training Loss: 2.8631, Validation Loss: 4.2021
Epoch 6/20, Training Loss: 2.7814, Validation Loss: 4.0358
Epoch 7/20, Training Loss: 2.7468, Validation Loss: 3.9407
Epoch 8/20, Training Loss: 2.7377, Validation Loss: 3.8920
Epoch 9/20, Training Loss: 2.7429, Validation Loss: 3.8757
Epoch 10/20, Training Loss: 2.7566, Validation Loss: 3.8826
Epoch 11/20, Training Loss: 2.7749, Validation Loss: 3.9054
Epoch 12/20, Training Loss: 2.7953, Validation Loss: 3.9379
Epoch 13/20, Training Loss: 2.8157, Validation Loss: 3.9755


[I 2024-12-15 16:13:09,428] Trial 24 finished with value: 3.8756627165865725 and parameters: {'learning_rate': 0.009971529359620148, 'lambda_reg': 0.000174119365113973, 'k': 27}. Best is trial 21 with value: 3.0987913564505427.


Epoch 14/20, Training Loss: 2.8353, Validation Loss: 4.0153
Early stopping triggered at epoch 14. Best Validation Loss: 3.8757
Epoch 1/20, Training Loss: 12.3921, Validation Loss: 12.7139
Epoch 2/20, Training Loss: 11.2100, Validation Loss: 12.2740
Epoch 3/20, Training Loss: 7.0219, Validation Loss: 9.9461
Epoch 4/20, Training Loss: 5.8643, Validation Loss: 8.5741
Epoch 5/20, Training Loss: 5.6948, Validation Loss: 8.0558
Epoch 6/20, Training Loss: 5.7754, Validation Loss: 7.8615
Epoch 7/20, Training Loss: 5.9265, Validation Loss: 7.8090
Epoch 8/20, Training Loss: 6.0977, Validation Loss: 7.8263
Epoch 9/20, Training Loss: 6.2711, Validation Loss: 7.8799
Epoch 10/20, Training Loss: 6.4392, Validation Loss: 7.9527
Epoch 11/20, Training Loss: 6.5991, Validation Loss: 8.0358


[I 2024-12-15 16:13:22,594] Trial 25 finished with value: 7.809017800063553 and parameters: {'learning_rate': 0.005123024539454447, 'lambda_reg': 0.0005889116771972554, 'k': 33}. Best is trial 21 with value: 3.0987913564505427.


Epoch 12/20, Training Loss: 6.7499, Validation Loss: 8.1245
Early stopping triggered at epoch 12. Best Validation Loss: 7.8090
Epoch 1/20, Training Loss: 12.2985, Validation Loss: 12.5964
Epoch 2/20, Training Loss: 12.2962, Validation Loss: 12.5961
Epoch 3/20, Training Loss: 12.2922, Validation Loss: 12.5953
Epoch 4/20, Training Loss: 12.2830, Validation Loss: 12.5926
Epoch 5/20, Training Loss: 12.2584, Validation Loss: 12.5839
Epoch 6/20, Training Loss: 12.1878, Validation Loss: 12.5570
Epoch 7/20, Training Loss: 11.9850, Validation Loss: 12.4763
Epoch 8/20, Training Loss: 11.4423, Validation Loss: 12.2513
Epoch 9/20, Training Loss: 10.2531, Validation Loss: 11.7235
Epoch 10/20, Training Loss: 8.4942, Validation Loss: 10.8261
Epoch 11/20, Training Loss: 6.8769, Validation Loss: 9.7888
Epoch 12/20, Training Loss: 5.7218, Validation Loss: 8.8531
Epoch 13/20, Training Loss: 4.8953, Validation Loss: 8.0708
Epoch 14/20, Training Loss: 4.2867, Validation Loss: 7.4247
Epoch 15/20, Training L

[I 2024-12-15 16:13:44,399] Trial 26 finished with value: 5.247330840498527 and parameters: {'learning_rate': 0.0014241481137006037, 'lambda_reg': 0.00010385960337157155, 'k': 41}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 2.6935, Validation Loss: 5.2473
Epoch 1/20, Training Loss: 12.7596, Validation Loss: 13.0634
Epoch 2/20, Training Loss: 12.7213, Validation Loss: 13.0682
Epoch 3/20, Training Loss: 12.2127, Validation Loss: 13.0461
Epoch 4/20, Training Loss: 10.4247, Validation Loss: 13.0268
Epoch 5/20, Training Loss: 10.4601, Validation Loss: 13.5117
Epoch 6/20, Training Loss: 11.3509, Validation Loss: 14.2210
Epoch 7/20, Training Loss: 12.4007, Validation Loss: 15.0157
Epoch 8/20, Training Loss: 13.4075, Validation Loss: 15.7929


[I 2024-12-15 16:13:54,143] Trial 27 finished with value: 13.02680001277477 and parameters: {'learning_rate': 0.003536766682235287, 'lambda_reg': 0.0019030550517473268, 'k': 32}. Best is trial 21 with value: 3.0987913564505427.


Epoch 9/20, Training Loss: 14.3298, Validation Loss: 16.5249
Early stopping triggered at epoch 9. Best Validation Loss: 13.0268
Epoch 1/20, Training Loss: 12.0967, Validation Loss: 12.5535
Epoch 2/20, Training Loss: 7.0184, Validation Loss: 9.6790
Epoch 3/20, Training Loss: 4.2904, Validation Loss: 6.7884
Epoch 4/20, Training Loss: 3.5606, Validation Loss: 5.6722
Epoch 5/20, Training Loss: 3.2862, Validation Loss: 5.1239
Epoch 6/20, Training Loss: 3.1600, Validation Loss: 4.8126
Epoch 7/20, Training Loss: 3.1002, Validation Loss: 4.6215
Epoch 8/20, Training Loss: 3.0759, Validation Loss: 4.4997
Epoch 9/20, Training Loss: 3.0720, Validation Loss: 4.4219
Epoch 10/20, Training Loss: 3.0802, Validation Loss: 4.3743
Epoch 11/20, Training Loss: 3.0958, Validation Loss: 4.3486
Epoch 12/20, Training Loss: 3.1159, Validation Loss: 4.3395
Epoch 13/20, Training Loss: 3.1388, Validation Loss: 4.3429
Epoch 14/20, Training Loss: 3.1632, Validation Loss: 4.3558
Epoch 15/20, Training Loss: 3.1882, Val

[I 2024-12-15 16:14:12,408] Trial 28 finished with value: 4.3394792947246845 and parameters: {'learning_rate': 0.006692754039244508, 'lambda_reg': 0.00021208458761990052, 'k': 19}. Best is trial 21 with value: 3.0987913564505427.


Epoch 17/20, Training Loss: 3.2378, Validation Loss: 4.4284
Early stopping triggered at epoch 17. Best Validation Loss: 4.3395
Epoch 1/20, Training Loss: 28.2376, Validation Loss: 28.6577
Epoch 2/20, Training Loss: 99.1822, Validation Loss: 101.8612
Epoch 3/20, Training Loss: 214.9684, Validation Loss: 217.6757
Epoch 4/20, Training Loss: 283.6638, Validation Loss: 285.9377
Epoch 5/20, Training Loss: 332.6586, Validation Loss: 334.6149


[I 2024-12-15 16:14:18,960] Trial 29 finished with value: 28.657709880542797 and parameters: {'learning_rate': 0.006817864539411394, 'lambda_reg': 0.04505229873679308, 'k': 27}. Best is trial 21 with value: 3.0987913564505427.


Epoch 6/20, Training Loss: 370.5970, Validation Loss: 372.3309
Early stopping triggered at epoch 6. Best Validation Loss: 28.6577
Epoch 1/20, Training Loss: 12.3615, Validation Loss: 12.6582
Epoch 2/20, Training Loss: 12.3609, Validation Loss: 12.6581
Epoch 3/20, Training Loss: 12.3603, Validation Loss: 12.6581
Epoch 4/20, Training Loss: 12.3597, Validation Loss: 12.6581
Epoch 5/20, Training Loss: 12.3590, Validation Loss: 12.6580
Epoch 6/20, Training Loss: 12.3582, Validation Loss: 12.6580
Epoch 7/20, Training Loss: 12.3572, Validation Loss: 12.6579
Epoch 8/20, Training Loss: 12.3561, Validation Loss: 12.6577
Epoch 9/20, Training Loss: 12.3546, Validation Loss: 12.6575
Epoch 10/20, Training Loss: 12.3528, Validation Loss: 12.6571
Epoch 11/20, Training Loss: 12.3504, Validation Loss: 12.6565
Epoch 12/20, Training Loss: 12.3471, Validation Loss: 12.6557
Epoch 13/20, Training Loss: 12.3428, Validation Loss: 12.6544
Epoch 14/20, Training Loss: 12.3369, Validation Loss: 12.6527
Epoch 15/20

[I 2024-12-15 16:14:41,230] Trial 30 finished with value: 12.609613081315453 and parameters: {'learning_rate': 0.0004326908362536612, 'lambda_reg': 0.00037558701101824787, 'k': 37}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 12.2114, Validation Loss: 12.6096
Epoch 1/20, Training Loss: 12.3070, Validation Loss: 12.6078
Epoch 2/20, Training Loss: 12.2828, Validation Loss: 12.5997
Epoch 3/20, Training Loss: 12.0878, Validation Loss: 12.5195
Epoch 4/20, Training Loss: 10.6900, Validation Loss: 11.8971
Epoch 5/20, Training Loss: 7.2328, Validation Loss: 9.9736
Epoch 6/20, Training Loss: 5.0219, Validation Loss: 8.1041
Epoch 7/20, Training Loss: 3.8934, Validation Loss: 6.8601
Epoch 8/20, Training Loss: 3.2591, Validation Loss: 6.0196
Epoch 9/20, Training Loss: 2.8708, Validation Loss: 5.4259
Epoch 10/20, Training Loss: 2.6145, Validation Loss: 4.9883
Epoch 11/20, Training Loss: 2.4355, Validation Loss: 4.6542
Epoch 12/20, Training Loss: 2.3054, Validation Loss: 4.3921
Epoch 13/20, Training Loss: 2.2080, Validation Loss: 4.1819
Epoch 14/20, Training Loss: 2.1336, Validation Loss: 4.0103
Epoch 15/20, Training Loss: 2.0759, Validation Loss: 3.8683
Epoch 16/20, Training Loss: 2.0306, Val

[I 2024-12-15 16:15:02,769] Trial 31 finished with value: 3.4288600061966146 and parameters: {'learning_rate': 0.002932749435887332, 'lambda_reg': 0.00010760196034193446, 'k': 26}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 1.9241, Validation Loss: 3.4289
Epoch 1/20, Training Loss: 12.3170, Validation Loss: 12.6257
Epoch 2/20, Training Loss: 12.1106, Validation Loss: 12.5463
Epoch 3/20, Training Loss: 9.7676, Validation Loss: 11.4750
Epoch 4/20, Training Loss: 5.8239, Validation Loss: 8.8497
Epoch 5/20, Training Loss: 4.1924, Validation Loss: 7.1138
Epoch 6/20, Training Loss: 3.4563, Validation Loss: 6.1012
Epoch 7/20, Training Loss: 3.0750, Validation Loss: 5.4620
Epoch 8/20, Training Loss: 2.8532, Validation Loss: 5.0295
Epoch 9/20, Training Loss: 2.7144, Validation Loss: 4.7215
Epoch 10/20, Training Loss: 2.6238, Validation Loss: 4.4937
Epoch 11/20, Training Loss: 2.5633, Validation Loss: 4.3206
Epoch 12/20, Training Loss: 2.5228, Validation Loss: 4.1864
Epoch 13/20, Training Loss: 2.4959, Validation Loss: 4.0810
Epoch 14/20, Training Loss: 2.4786, Validation Loss: 3.9974
Epoch 15/20, Training Loss: 2.4682, Validation Loss: 3.9308
Epoch 16/20, Training Loss: 2.4628, Validati

[I 2024-12-15 16:15:23,691] Trial 32 finished with value: 3.7578196144632683 and parameters: {'learning_rate': 0.0038487717066418266, 'lambda_reg': 0.00016106918150976174, 'k': 25}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 2.4685, Validation Loss: 3.7578
Epoch 1/20, Training Loss: 12.3455, Validation Loss: 12.6432
Epoch 2/20, Training Loss: 12.3432, Validation Loss: 12.6429
Epoch 3/20, Training Loss: 12.3399, Validation Loss: 12.6423
Epoch 4/20, Training Loss: 12.3341, Validation Loss: 12.6407
Epoch 5/20, Training Loss: 12.3228, Validation Loss: 12.6370
Epoch 6/20, Training Loss: 12.2985, Validation Loss: 12.6283
Epoch 7/20, Training Loss: 12.2451, Validation Loss: 12.6083
Epoch 8/20, Training Loss: 12.1265, Validation Loss: 12.5628
Epoch 9/20, Training Loss: 11.8701, Validation Loss: 12.4621
Epoch 10/20, Training Loss: 11.3531, Validation Loss: 12.2529
Epoch 11/20, Training Loss: 10.4497, Validation Loss: 11.8669
Epoch 12/20, Training Loss: 9.2058, Validation Loss: 11.2792
Epoch 13/20, Training Loss: 7.9321, Validation Loss: 10.5717
Epoch 14/20, Training Loss: 6.8963, Validation Loss: 9.8690
Epoch 15/20, Training Loss: 6.1136, Validation Loss: 9.2365
Epoch 16/20, Training Los

[I 2024-12-15 16:15:45,442] Trial 33 finished with value: 7.152406674282025 and parameters: {'learning_rate': 0.0010735169410610427, 'lambda_reg': 0.0002406073366907951, 'k': 29}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 4.1587, Validation Loss: 7.1524
Epoch 1/20, Training Loss: 12.3223, Validation Loss: 12.6226
Epoch 2/20, Training Loss: 12.3102, Validation Loss: 12.6196
Epoch 3/20, Training Loss: 12.2603, Validation Loss: 12.6017
Epoch 4/20, Training Loss: 12.0168, Validation Loss: 12.5048
Epoch 5/20, Training Loss: 10.9800, Validation Loss: 12.0637
Epoch 6/20, Training Loss: 8.5366, Validation Loss: 10.8536
Epoch 7/20, Training Loss: 6.2935, Validation Loss: 9.3290
Epoch 8/20, Training Loss: 4.9745, Validation Loss: 8.1133
Epoch 9/20, Training Loss: 4.1560, Validation Loss: 7.2119
Epoch 10/20, Training Loss: 3.6291, Validation Loss: 6.5394
Epoch 11/20, Training Loss: 3.2752, Validation Loss: 6.0272
Epoch 12/20, Training Loss: 3.0269, Validation Loss: 5.6275
Epoch 13/20, Training Loss: 2.8462, Validation Loss: 5.3089
Epoch 14/20, Training Loss: 2.7108, Validation Loss: 5.0500
Epoch 15/20, Training Loss: 2.6071, Validation Loss: 4.8364
Epoch 16/20, Training Loss: 2.5263, Va

[I 2024-12-15 16:16:07,033] Trial 34 finished with value: 4.170464070234884 and parameters: {'learning_rate': 0.002160988356950365, 'lambda_reg': 0.00014010688838985203, 'k': 24}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 2.3380, Validation Loss: 4.1705
Epoch 1/20, Training Loss: 11.8009, Validation Loss: 12.4489
Epoch 2/20, Training Loss: 6.2226, Validation Loss: 8.9106
Epoch 3/20, Training Loss: 4.5715, Validation Loss: 6.8031
Epoch 4/20, Training Loss: 4.1827, Validation Loss: 6.0593
Epoch 5/20, Training Loss: 4.0804, Validation Loss: 5.7278
Epoch 6/20, Training Loss: 4.0683, Validation Loss: 5.5639
Epoch 7/20, Training Loss: 4.0940, Validation Loss: 5.4822
Epoch 8/20, Training Loss: 4.1377, Validation Loss: 5.4468
Epoch 9/20, Training Loss: 4.1900, Validation Loss: 5.4406
Epoch 10/20, Training Loss: 4.2464, Validation Loss: 5.4548
Epoch 11/20, Training Loss: 4.3042, Validation Loss: 5.4841
Epoch 12/20, Training Loss: 4.3623, Validation Loss: 5.5247
Epoch 13/20, Training Loss: 4.4198, Validation Loss: 5.5737


[I 2024-12-15 16:16:22,354] Trial 35 finished with value: 5.440618982409106 and parameters: {'learning_rate': 0.0074872096977469495, 'lambda_reg': 0.00031240036648782024, 'k': 21}. Best is trial 21 with value: 3.0987913564505427.


Epoch 14/20, Training Loss: 4.4760, Validation Loss: 5.6285
Early stopping triggered at epoch 14. Best Validation Loss: 5.4406
Epoch 1/20, Training Loss: 12.3038, Validation Loss: 12.6076
Epoch 2/20, Training Loss: 12.2472, Validation Loss: 12.5877
Epoch 3/20, Training Loss: 11.6964, Validation Loss: 12.3596
Epoch 4/20, Training Loss: 8.8438, Validation Loss: 10.9833
Epoch 5/20, Training Loss: 5.7067, Validation Loss: 8.7711
Epoch 6/20, Training Loss: 4.1996, Validation Loss: 7.2213
Epoch 7/20, Training Loss: 3.4016, Validation Loss: 6.2125
Epoch 8/20, Training Loss: 2.9394, Validation Loss: 5.5257
Epoch 9/20, Training Loss: 2.6462, Validation Loss: 5.0338
Epoch 10/20, Training Loss: 2.4474, Validation Loss: 4.6666
Epoch 11/20, Training Loss: 2.3060, Validation Loss: 4.3835
Epoch 12/20, Training Loss: 2.2022, Validation Loss: 4.1598
Epoch 13/20, Training Loss: 2.1243, Validation Loss: 3.9794
Epoch 14/20, Training Loss: 2.0649, Validation Loss: 3.8318
Epoch 15/20, Training Loss: 2.0191,

[I 2024-12-15 16:16:44,003] Trial 36 finished with value: 3.3314468187549453 and parameters: {'learning_rate': 0.0031190312968784714, 'lambda_reg': 0.0001070111735618258, 'k': 26}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 1.9012, Validation Loss: 3.3314
Epoch 1/20, Training Loss: 12.3115, Validation Loss: 12.6081
Epoch 2/20, Training Loss: 12.3106, Validation Loss: 12.6081
Epoch 3/20, Training Loss: 12.3096, Validation Loss: 12.6081
Epoch 4/20, Training Loss: 12.3086, Validation Loss: 12.6080
Epoch 5/20, Training Loss: 12.3074, Validation Loss: 12.6080
Epoch 6/20, Training Loss: 12.3061, Validation Loss: 12.6078
Epoch 7/20, Training Loss: 12.3045, Validation Loss: 12.6076
Epoch 8/20, Training Loss: 12.3025, Validation Loss: 12.6072
Epoch 9/20, Training Loss: 12.2999, Validation Loss: 12.6066
Epoch 10/20, Training Loss: 12.2964, Validation Loss: 12.6057
Epoch 11/20, Training Loss: 12.2917, Validation Loss: 12.6044
Epoch 12/20, Training Loss: 12.2851, Validation Loss: 12.6023
Epoch 13/20, Training Loss: 12.2758, Validation Loss: 12.5993
Epoch 14/20, Training Loss: 12.2625, Validation Loss: 12.5946
Epoch 15/20, Training Loss: 12.2434, Validation Loss: 12.5878
Epoch 16/20, Traini

[I 2024-12-15 16:17:05,792] Trial 37 finished with value: 12.461318514918853 and parameters: {'learning_rate': 0.0005005086079879693, 'lambda_reg': 0.00010493794867272047, 'k': 26}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 11.9178, Validation Loss: 12.4613
Epoch 1/20, Training Loss: 14.7689, Validation Loss: 15.0701
Epoch 2/20, Training Loss: 14.8001, Validation Loss: 15.1168
Epoch 3/20, Training Loss: 14.9782, Validation Loss: 15.3993
Epoch 4/20, Training Loss: 16.2172, Validation Loss: 17.3105
Epoch 5/20, Training Loss: 21.8281, Validation Loss: 24.4607


[I 2024-12-15 16:17:12,289] Trial 38 finished with value: 15.070101766321843 and parameters: {'learning_rate': 0.0028173650056427103, 'lambda_reg': 0.009478382790204616, 'k': 31}. Best is trial 21 with value: 3.0987913564505427.


Epoch 6/20, Training Loss: 31.0218, Validation Loss: 34.1494
Early stopping triggered at epoch 6. Best Validation Loss: 15.0701
Epoch 1/20, Training Loss: 12.5208, Validation Loss: 12.8185
Epoch 2/20, Training Loss: 12.5204, Validation Loss: 12.8185
Epoch 3/20, Training Loss: 12.5200, Validation Loss: 12.8185
Epoch 4/20, Training Loss: 12.5196, Validation Loss: 12.8185
Epoch 5/20, Training Loss: 12.5191, Validation Loss: 12.8185
Epoch 6/20, Training Loss: 12.5187, Validation Loss: 12.8184
Epoch 7/20, Training Loss: 12.5183, Validation Loss: 12.8184
Epoch 8/20, Training Loss: 12.5179, Validation Loss: 12.8185
Epoch 9/20, Training Loss: 12.5175, Validation Loss: 12.8185
Epoch 10/20, Training Loss: 12.5170, Validation Loss: 12.8185
Epoch 11/20, Training Loss: 12.5166, Validation Loss: 12.8185


[I 2024-12-15 16:17:25,370] Trial 39 finished with value: 12.818449853689147 and parameters: {'learning_rate': 0.0001781626744689916, 'lambda_reg': 0.0006244890231866175, 'k': 21}. Best is trial 21 with value: 3.0987913564505427.


Epoch 12/20, Training Loss: 12.5161, Validation Loss: 12.8185
Early stopping triggered at epoch 12. Best Validation Loss: 12.8184
Epoch 1/20, Training Loss: 12.3227, Validation Loss: 12.6219
Epoch 2/20, Training Loss: 12.3193, Validation Loss: 12.6215
Epoch 3/20, Training Loss: 12.3144, Validation Loss: 12.6207
Epoch 4/20, Training Loss: 12.3065, Validation Loss: 12.6187
Epoch 5/20, Training Loss: 12.2919, Validation Loss: 12.6142
Epoch 6/20, Training Loss: 12.2631, Validation Loss: 12.6043
Epoch 7/20, Training Loss: 12.2042, Validation Loss: 12.5825
Epoch 8/20, Training Loss: 12.0831, Validation Loss: 12.5358
Epoch 9/20, Training Loss: 11.8385, Validation Loss: 12.4385
Epoch 10/20, Training Loss: 11.3718, Validation Loss: 12.2462
Epoch 11/20, Training Loss: 10.5761, Validation Loss: 11.9011
Epoch 12/20, Training Loss: 9.4510, Validation Loss: 11.3696
Epoch 13/20, Training Loss: 8.2049, Validation Loss: 10.6962
Epoch 14/20, Training Loss: 7.0994, Validation Loss: 9.9853
Epoch 15/20, Tr

[I 2024-12-15 16:17:46,702] Trial 40 finished with value: 6.951803474350786 and parameters: {'learning_rate': 0.0009875991690017532, 'lambda_reg': 0.00010030091967269539, 'k': 18}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 3.8495, Validation Loss: 6.9518
Epoch 1/20, Training Loss: 12.2258, Validation Loss: 12.5865
Epoch 2/20, Training Loss: 9.0092, Validation Loss: 11.0041
Epoch 3/20, Training Loss: 4.7329, Validation Loss: 7.5277
Epoch 4/20, Training Loss: 3.4882, Validation Loss: 5.9238
Epoch 5/20, Training Loss: 3.0079, Validation Loss: 5.1150
Epoch 6/20, Training Loss: 2.7710, Validation Loss: 4.6423
Epoch 7/20, Training Loss: 2.6382, Validation Loss: 4.3388
Epoch 8/20, Training Loss: 2.5593, Validation Loss: 4.1321
Epoch 9/20, Training Loss: 2.5117, Validation Loss: 3.9862
Epoch 10/20, Training Loss: 2.4837, Validation Loss: 3.8813
Epoch 11/20, Training Loss: 2.4683, Validation Loss: 3.8057
Epoch 12/20, Training Loss: 2.4615, Validation Loss: 3.7520
Epoch 13/20, Training Loss: 2.4607, Validation Loss: 3.7153
Epoch 14/20, Training Loss: 2.4640, Validation Loss: 3.6919
Epoch 15/20, Training Loss: 2.4701, Validation Loss: 3.6790
Epoch 16/20, Training Loss: 2.4782, Validation

[I 2024-12-15 16:18:08,397] Trial 41 finished with value: 3.6743550087672743 and parameters: {'learning_rate': 0.005721741002408693, 'lambda_reg': 0.0001579947381313396, 'k': 28}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 2.5167, Validation Loss: 3.7056
Epoch 1/20, Training Loss: 12.3195, Validation Loss: 12.6218
Epoch 2/20, Training Loss: 12.2604, Validation Loss: 12.6000
Epoch 3/20, Training Loss: 11.4158, Validation Loss: 12.2433
Epoch 4/20, Training Loss: 7.6086, Validation Loss: 10.2513
Epoch 5/20, Training Loss: 5.0832, Validation Loss: 8.1036
Epoch 6/20, Training Loss: 4.0378, Validation Loss: 6.8508
Epoch 7/20, Training Loss: 3.5429, Validation Loss: 6.0917
Epoch 8/20, Training Loss: 3.2797, Validation Loss: 5.5977
Epoch 9/20, Training Loss: 3.1282, Validation Loss: 5.2574
Epoch 10/20, Training Loss: 3.0375, Validation Loss: 5.0133
Epoch 11/20, Training Loss: 2.9830, Validation Loss: 4.8328
Epoch 12/20, Training Loss: 2.9513, Validation Loss: 4.6965
Epoch 13/20, Training Loss: 2.9347, Validation Loss: 4.5922
Epoch 14/20, Training Loss: 2.9285, Validation Loss: 4.5117
Epoch 15/20, Training Loss: 2.9295, Validation Loss: 4.4493
Epoch 16/20, Training Loss: 2.9354, Valida

[I 2024-12-15 16:18:29,865] Trial 42 finished with value: 4.301248344113954 and parameters: {'learning_rate': 0.003676888477935432, 'lambda_reg': 0.00021175185307307245, 'k': 37}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 2.9844, Validation Loss: 4.3012
Epoch 1/20, Training Loss: 12.3198, Validation Loss: 12.6211
Epoch 2/20, Training Loss: 12.3027, Validation Loss: 12.6162
Epoch 3/20, Training Loss: 12.2094, Validation Loss: 12.5808
Epoch 4/20, Training Loss: 11.6611, Validation Loss: 12.3547
Epoch 5/20, Training Loss: 9.5472, Validation Loss: 11.3820
Epoch 6/20, Training Loss: 6.7092, Validation Loss: 9.6540
Epoch 7/20, Training Loss: 5.0652, Validation Loss: 8.2026
Epoch 8/20, Training Loss: 4.1219, Validation Loss: 7.1698
Epoch 9/20, Training Loss: 3.5484, Validation Loss: 6.4273
Epoch 10/20, Training Loss: 3.1791, Validation Loss: 5.8782
Epoch 11/20, Training Loss: 2.9280, Validation Loss: 5.4597
Epoch 12/20, Training Loss: 2.7495, Validation Loss: 5.1321
Epoch 13/20, Training Loss: 2.6184, Validation Loss: 4.8700
Epoch 14/20, Training Loss: 2.5196, Validation Loss: 4.6564
Epoch 15/20, Training Loss: 2.4437, Validation Loss: 4.4798
Epoch 16/20, Training Loss: 2.3846, Vali

[I 2024-12-15 16:18:51,566] Trial 43 finished with value: 3.928825821938669 and parameters: {'learning_rate': 0.002440022178026772, 'lambda_reg': 0.0001352934274528079, 'k': 24}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 2.2488, Validation Loss: 3.9288
Epoch 1/20, Training Loss: 12.2196, Validation Loss: 12.6070
Epoch 2/20, Training Loss: 7.8531, Validation Loss: 10.3034
Epoch 3/20, Training Loss: 4.9534, Validation Loss: 7.4268
Epoch 4/20, Training Loss: 4.3296, Validation Loss: 6.4224
Epoch 5/20, Training Loss: 4.1745, Validation Loss: 5.9912
Epoch 6/20, Training Loss: 4.1488, Validation Loss: 5.7809
Epoch 7/20, Training Loss: 4.1738, Validation Loss: 5.6752
Epoch 8/20, Training Loss: 4.2221, Validation Loss: 5.6258
Epoch 9/20, Training Loss: 4.2811, Validation Loss: 5.6102
Epoch 10/20, Training Loss: 4.3446, Validation Loss: 5.6169
Epoch 11/20, Training Loss: 4.4093, Validation Loss: 5.6395
Epoch 12/20, Training Loss: 4.4737, Validation Loss: 5.6739
Epoch 13/20, Training Loss: 4.5370, Validation Loss: 5.7172


[I 2024-12-15 16:19:06,797] Trial 44 finished with value: 5.610195188102157 and parameters: {'learning_rate': 0.006871367531710096, 'lambda_reg': 0.00033413384069534723, 'k': 32}. Best is trial 21 with value: 3.0987913564505427.


Epoch 14/20, Training Loss: 4.5984, Validation Loss: 5.7671
Early stopping triggered at epoch 14. Best Validation Loss: 5.6102
Epoch 1/20, Training Loss: 12.3073, Validation Loss: 12.6039
Epoch 2/20, Training Loss: 12.3073, Validation Loss: 12.6039
Epoch 3/20, Training Loss: 12.3073, Validation Loss: 12.6039
Epoch 4/20, Training Loss: 12.3073, Validation Loss: 12.6039
Epoch 5/20, Training Loss: 12.3072, Validation Loss: 12.6039
Epoch 6/20, Training Loss: 12.3072, Validation Loss: 12.6039
Epoch 7/20, Training Loss: 12.3072, Validation Loss: 12.6039
Epoch 8/20, Training Loss: 12.3072, Validation Loss: 12.6039
Epoch 9/20, Training Loss: 12.3071, Validation Loss: 12.6038
Epoch 10/20, Training Loss: 12.3071, Validation Loss: 12.6038
Epoch 11/20, Training Loss: 12.3071, Validation Loss: 12.6038
Epoch 12/20, Training Loss: 12.3071, Validation Loss: 12.6038
Epoch 13/20, Training Loss: 12.3071, Validation Loss: 12.6038
Epoch 14/20, Training Loss: 12.3070, Validation Loss: 12.6038
Epoch 15/20, T

[I 2024-12-15 16:19:28,452] Trial 45 finished with value: 12.60383339840268 and parameters: {'learning_rate': 1.8717546935274316e-05, 'lambda_reg': 0.0001388754878712323, 'k': 41}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 12.3069, Validation Loss: 12.6038
Epoch 1/20, Training Loss: 12.3212, Validation Loss: 12.6190
Epoch 2/20, Training Loss: 12.3185, Validation Loss: 12.6186
Epoch 3/20, Training Loss: 12.3121, Validation Loss: 12.6170
Epoch 4/20, Training Loss: 12.2903, Validation Loss: 12.6096
Epoch 5/20, Training Loss: 12.2063, Validation Loss: 12.5780
Epoch 6/20, Training Loss: 11.8853, Validation Loss: 12.4515
Epoch 7/20, Training Loss: 10.8536, Validation Loss: 12.0219
Epoch 8/20, Training Loss: 8.8208, Validation Loss: 11.0517
Epoch 9/20, Training Loss: 6.8949, Validation Loss: 9.8410
Epoch 10/20, Training Loss: 5.6782, Validation Loss: 8.8116
Epoch 11/20, Training Loss: 4.8944, Validation Loss: 8.0127
Epoch 12/20, Training Loss: 4.3762, Validation Loss: 7.3984
Epoch 13/20, Training Loss: 4.0270, Validation Loss: 6.9221
Epoch 14/20, Training Loss: 3.7856, Validation Loss: 6.5472
Epoch 15/20, Training Loss: 3.6146, Validation Loss: 6.2473
Epoch 16/20, Training Loss: 3.49

[I 2024-12-15 16:19:50,530] Trial 46 finished with value: 5.378626452485868 and parameters: {'learning_rate': 0.0018213751394938742, 'lambda_reg': 0.0002493204899506392, 'k': 47}. Best is trial 21 with value: 3.0987913564505427.


Epoch 20/20, Training Loss: 3.2507, Validation Loss: 5.3786
Epoch 1/20, Training Loss: 12.4565, Validation Loss: 12.7647
Epoch 2/20, Training Loss: 12.0803, Validation Loss: 12.6524
Epoch 3/20, Training Loss: 8.7236, Validation Loss: 11.3138
Epoch 4/20, Training Loss: 7.2099, Validation Loss: 10.0674
Epoch 5/20, Training Loss: 7.2003, Validation Loss: 9.7325
Epoch 6/20, Training Loss: 7.5135, Validation Loss: 9.7443
Epoch 7/20, Training Loss: 7.8814, Validation Loss: 9.8821
Epoch 8/20, Training Loss: 8.2440, Validation Loss: 10.0704
Epoch 9/20, Training Loss: 8.5861, Validation Loss: 10.2775


[I 2024-12-15 16:20:01,339] Trial 47 finished with value: 9.73246009553857 and parameters: {'learning_rate': 0.004866090746483509, 'lambda_reg': 0.0008855739369574304, 'k': 38}. Best is trial 21 with value: 3.0987913564505427.


Epoch 10/20, Training Loss: 8.9039, Validation Loss: 10.4887
Early stopping triggered at epoch 10. Best Validation Loss: 9.7325
Epoch 1/20, Training Loss: 13.9931, Validation Loss: 14.2994
Epoch 2/20, Training Loss: 13.9806, Validation Loss: 14.3242
Epoch 3/20, Training Loss: 13.8153, Validation Loss: 14.4568
Epoch 4/20, Training Loss: 13.2053, Validation Loss: 15.2487
Epoch 5/20, Training Loss: 14.1676, Validation Loss: 17.2228


[I 2024-12-15 16:20:08,044] Trial 48 finished with value: 14.29940398513581 and parameters: {'learning_rate': 0.0030792340387138886, 'lambda_reg': 0.003169704018302183, 'k': 15}. Best is trial 21 with value: 3.0987913564505427.


Epoch 6/20, Training Loss: 16.1871, Validation Loss: 19.2361
Early stopping triggered at epoch 6. Best Validation Loss: 14.2994
Epoch 1/20, Training Loss: 10.4513, Validation Loss: 11.7126
Epoch 2/20, Training Loss: 4.6418, Validation Loss: 6.9581
Epoch 3/20, Training Loss: 3.2564, Validation Loss: 5.1629
Epoch 4/20, Training Loss: 2.7875, Validation Loss: 4.4094
Epoch 5/20, Training Loss: 2.5550, Validation Loss: 4.0074
Epoch 6/20, Training Loss: 2.4207, Validation Loss: 3.7638
Epoch 7/20, Training Loss: 2.3408, Validation Loss: 3.6074
Epoch 8/20, Training Loss: 2.2944, Validation Loss: 3.5056
Epoch 9/20, Training Loss: 2.2686, Validation Loss: 3.4413
Epoch 10/20, Training Loss: 2.2559, Validation Loss: 3.4042
Epoch 11/20, Training Loss: 2.2514, Validation Loss: 3.3874
Epoch 12/20, Training Loss: 2.2524, Validation Loss: 3.3856
Epoch 13/20, Training Loss: 2.2568, Validation Loss: 3.3947
Epoch 14/20, Training Loss: 2.2632, Validation Loss: 3.4110
Epoch 15/20, Training Loss: 2.2704, Val

[I 2024-12-15 16:20:26,543] Trial 49 finished with value: 3.385616004031578 and parameters: {'learning_rate': 0.008575062139789553, 'lambda_reg': 0.00013331201319424275, 'k': 22}. Best is trial 21 with value: 3.0987913564505427.


Epoch 17/20, Training Loss: 2.2853, Validation Loss: 3.4803
Early stopping triggered at epoch 17. Best Validation Loss: 3.3856

Best Parameters Found:
Learning Rate: 0.004099151484173524
Regularization: 0.00010453688227089353
Latent Factors: 27
Best Validation Loss: 3.0988

Optuna Study Results:
    trial  learning_rate  lambda_reg   k   val_loss
0       0       0.005959    0.096431  25  45.545911
1       1       0.000743    0.006593  38  13.987573
2       2       0.002544    0.000110  28   3.646739
3       3       0.000704    0.000894  50  12.721222
4       4       0.005405    0.000332  47   5.527626
5       5       0.000050    0.005194  23  14.405538
6       6       0.000045    0.001123  15  13.181947
7       7       0.002667    0.001261  22  11.607918
8       8       0.000250    0.000742  46  12.706324
9       9       0.000104    0.005297  12  16.171538
10     10       0.000011    0.000122  34  12.604360
11     11       0.006966    0.000141  39   3.433063
12     12       0.001723   

# Step 5: Evaluate the model

In [27]:
# Function to compute predictions
def compute_predictions(U, V):
    """Compute the full user-item rating prediction matrix."""
    return U @ V.T

# Function to evaluate on the test set
def evaluate_model(test_matrix, U, V):
    """Evaluate the model on the test set and compute the RMSE."""
    row_indices, col_indices = test_matrix.nonzero()
    predictions = np.sum(U[row_indices] * V[col_indices], axis=1)
    actual = test_matrix.data
    errors = actual - predictions
    mse = np.mean(errors ** 2)
    rmse = np.sqrt(mse)
    return mse, rmse

# Compute predictions using the learned U and V
predicted_matrix = compute_predictions(U, V)

# Evaluate on the test set
test_mse, test_rmse = evaluate_model(test_normalized, U, V)

# Print evaluation metrics
print("\nEvaluation Results on the Test Set:")
print(f"Mean Squared Error (MSE): {test_mse:.4f}")
print(f"Root Mean Squared Error (RMSE): {test_rmse:.4f}")


Evaluation Results on the Test Set:
Mean Squared Error (MSE): 1.8485
Root Mean Squared Error (RMSE): 1.3596


# Step 6: Get recommendations for users

In [28]:
def get_user_recommendations(user_id, user_map, item_map, U, V, user_means, train_matrix, top_n=10):
    """
    Generate movie recommendations for a specific user.
    
    Args:
        user_id (int): ID of the user to generate recommendations for.
        user_map (dict): Mapping of user IDs to matrix indices.
        item_map (dict): Mapping of movie IDs to matrix indices.
        U (ndarray): User latent factors.
        V (ndarray): Item latent factors.
        user_means (ndarray): Mean rating for each user.
        train_matrix (csr_matrix): The training matrix to check existing interactions.
        top_n (int): Number of recommendations to return.
    
    Returns:
        recommendations (list): List of tuples with (movie_id, predicted_rating).
    """
    # Ensure the user exists in the training data
    if user_id not in user_map:
        raise ValueError(f"User {user_id} not found in training data.")
    
    # Get the user's row index in the matrix
    user_idx = user_map[user_id]
    
    # Compute predictions for all movies
    predictions = U[user_idx] @ V.T
    
    # Denormalize the predictions
    predictions += user_means[user_idx]
    
    # Find movies the user hasn't rated yet
    user_interactions = train_matrix.getrow(user_idx).toarray().flatten()
    unseen_items = np.where(user_interactions == 0)[0]
    
    # Get predictions for unseen movies only
    unseen_predictions = {item: predictions[item] for item in unseen_items}
    
    # Sort predictions by rating in descending order
    sorted_predictions = sorted(unseen_predictions.items(), key=lambda x: x[1], reverse=True)
    
    # Map matrix indices back to movie IDs and return the top N
    reverse_item_map = {v: k for k, v in item_map.items()}
    recommendations = [(reverse_item_map[item_idx], rating) for item_idx, rating in sorted_predictions[:top_n]]
    
    return recommendations

# Example usage:
# Generate recommendations for user 1
user_id = 1
top_n = 10
recommendations = get_user_recommendations(
    user_id=user_id,
    user_map=train_user_map,
    item_map=train_item_map,
    U=U,
    V=V,
    user_means=train_user_means,
    train_matrix=train_sparse,
    top_n=top_n
)

# Print recommendations
print(f"Top {top_n} recommendations for User {user_id}:")
for movie_id, predicted_rating in recommendations:
    print(f"Movie ID: {movie_id}, Predicted Rating: {predicted_rating:.2f}")


Top 10 recommendations for User 1:
Movie ID: 86781, Predicted Rating: 5.85
Movie ID: 1104, Predicted Rating: 5.82
Movie ID: 750, Predicted Rating: 5.82
Movie ID: 38304, Predicted Rating: 5.71
Movie ID: 6460, Predicted Rating: 5.71
Movie ID: 1446, Predicted Rating: 5.69
Movie ID: 1178, Predicted Rating: 5.69
Movie ID: 951, Predicted Rating: 5.68
Movie ID: 3030, Predicted Rating: 5.68
Movie ID: 5690, Predicted Rating: 5.68


# Conclusion

In this notebook, we explored and implemented a **Matrix Factorization-based Recommendation System** step by step, covering key aspects of building and evaluating a recommender model. Here’s a summary of what we achieved:

---

## **1. Problem Definition**
We began by defining the goal: to predict user ratings for items they haven't interacted with and to recommend the top-N items based on these predictions.

---

## **2. Data Exploration and Preprocessing**
- Explored the structure of the dataset (`ratings.csv`).
- Prepared a **User-Item Interaction Matrix** in a memory-efficient sparse format.
- Normalized the data to account for user biases (mean rating per user).

---

## **3. Model Implementation**
- Implemented a **Matrix Factorization Model** using Stochastic Gradient Descent (SGD).
- Incorporated key techniques such as:
  - **Regularization** to prevent overfitting.
  - **Early Stopping** to stop training when the validation loss stopped improving.

---

## **4. Hyperparameter Tuning**
- Used both **Grid Search** and **Optuna** to tune hyperparameters such as:
  - Learning rate
  - Regularization strength
  - Number of latent factors (dimensions of $U$ and $V$).
- Identified the best hyperparameters for our model using validation loss.

---

## **5. Model Evaluation**
- Evaluated the model on a separate **test set** to ensure generalization.
- Computed key metrics such as **Mean Squared Error (MSE)** and **Root Mean Squared Error (RMSE)** to assess the model’s accuracy in predicting user ratings.

---

## **6. Generating Recommendations**
- Implemented a method to generate recommendations for individual users:
  - Predicted ratings for all unseen items.
  - Denormalized predictions to restore the original rating scale.
  - Ranked items by predicted ratings and retrieved the top-N recommendations.

---

## **7. Reflection and Next Steps**
This project demonstrated the process of building a collaborative filtering system based on matrix factorization. While effective, it comes with limitations, such as handling:
- **Dynamic data**: New users, new items, or updated ratings require retraining or partial updates.
- **Cold-start problems**: Sparse user-item interactions can limit the system's ability to recommend effectively.

### **Future Directions**
To overcome these limitations, we could explore:
- **Advanced Matrix Factorization Approaches**:
  - Incorporate online updates or hybrid models (e.g., blending content-based and collaborative filtering).
- **Neural Networks**:
  - Use deep learning techniques to build a more robust recommendation system.
- **Evaluation Metrics**:
  - Evaluate recommendations using ranking metrics like Precision@K and Recall@K to measure the quality of top-N recommendations.

---

By following this systematic approach, we gained a deeper understanding of collaborative filtering and built a foundation for creating more advanced and scalable recommendation systems.
