# Matrix Completion via Gradient Descent with Probabilistic Perspective (60 pts)



In this problem set, we will focus on the matrix completion problem, where we aim to infer the missing entries of a partially observed matrix. Matrix completion has applications in fields like computer vision, signal processing, and recommendation systems.

We intent to set up an objective function derived from probabilistic assumptions, evaluate gradients, and apply gradient descent for optimization. Additionally, we’ll explore practical aspects of implementation and validation.



In [None]:
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import seaborn as sns

## **Part 1: Theoretical Formulation and Objective Properties**

### **1.1 Probabilistic Framework for Matrix Completion (6 pts)**

One may interpret matrix completion from a probabilistic standpoint. Let $X \in \mathbb{R}^{N \times M}$ represent our target matrix, where only a subset of entries, indexed by $\Omega \subset \{1, \ldots, N\} \times \{1, \ldots, M\}$, is observed. Denoting the observed entries as $X_{\Omega}$, we can formulate the posterior probability distribution for $X$ as:

$$
p(X | X_\Omega) \propto p(X_\Omega | X) p(X),
$$

where:
- $X$ is our estimate of the completed matrix,
- $X_\Omega$ denotes the observed entries of $X$,
- $p(X_\Omega | X)$ is the likelihood of observing $X_\Omega$ given $X$, and
- $p(X)$ is a prior that encourages low-rank solutions.

### **1.2 Low-Rank Matrix Factorization**

To simplify optimization, we approximate $X$ by decomposing it into two smaller matrices $U \in \mathbb{R}^{N \times K}$ and $V \in \mathbb{R}^{M \times K}$, where $K$ is an estimate of the rank of $X$ with $K \ll \min(N, M)$. We introduce prior distributions for $U$ and $V$ based on normal distributions with precision parameters $\lambda_U$ and $\lambda_V$:

$$
\begin{aligned}
    p(U | \lambda_U) &= \prod_{i = 1}^{N} \mathcal{N}(\boldsymbol{u}_i^T | 0, \lambda_U^{-1} I), \\
    p(V | \lambda_V) &= \prod_{j = 1}^{M} \mathcal{N}(\boldsymbol{v}_j^T | 0, \lambda_V^{-1} I),
\end{aligned}
$$

where $\boldsymbol{u}_i^T$ and $\boldsymbol{v}_j^T$ represent the rows of $U$ and $V$, respectively.

The observed elements $x_{ij}$ are modeled with precision $\lambda$ and expected values $\boldsymbol{u}_i^T \boldsymbol{v}_j$. Hence, our likelihood function is:

$$
p(X_\Omega | U, V, \lambda) = \prod_{(i, j) \in \Omega} \mathcal{N}(x_{ij} | \boldsymbol{u}_i^T \boldsymbol{v}_j, \lambda^{-1}).
$$


### **1.3 Objective Function Formulation**

Based on this probabilistic setup, we derive the Maximum A Posteriori (MAP) estimate of $U$ and $V$ by minimizing the following objective function:

$$
\mathcal{L}(U, V) = \frac{1}{2} \left\{ \lambda \| \mathcal{P}_\Omega(X) - \mathcal{P}_{\Omega}(U V^T) \|_F^2 + \lambda_U \| U \|_F^2 + \lambda_V \| V \|_F^2 \right\},
$$

where $\mathcal{P}_\Omega$ denotes a projection operator that zeroes out elements not in $\Omega$.

This formulation shows that MAP estimation for matrix completion reduces to an optimization problem over the factors $U$ and $V$.

**<u>Subproblem 1</u> (3 pts):** Derive the MAP objective function $\mathcal{L}(U, V)$ and explain briefly why minimizing this objective is equivalent to solving the matrix completion problem.

**Hint:** You may start from the likelihood and prior expressions, then take the log of the posterior to obtain the objective.


**<u>Solution to Subproblem 1</u>:**

---

### **1.4 Properties of the Objective and Regularization**

The matrix completion problem can be expressed in terms of finding two low-rank factors $U \in \mathbb{R}^{N \times K}$ and $V \in \mathbb{R}^{M \times K}$ that approximate the observed entries of $X$:

$$
\min_{U, V} \frac{1}{2}\ \| \mathcal{P}_\Omega(X) - \mathcal{P}_{\Omega}(U V^T) \|_F^2 + \lambda_U' \mathcal{R}(U) + \lambda_V' \mathcal{R}(V),
$$

where $\mathcal{R}$ is a regularization function, and $\lambda_U' > 0$ and $\lambda_V' > 0$ are regularization parameters.

An alternative approach mentioned in the lecture involves minimizing the **nuclear norm** of $X$, defined as the sum of its singular values, as a proxy for rank minimization:

$$
\min_{X} \frac{1}{2}\ \| \mathcal{P}_\Omega(X) - \mathcal{P}_{\Omega}(X) \|_F^2 + \mu \|X\|_*,
$$

where $\|X\|_*$ denotes the nuclear norm of $X$, and $\mu > 0$.

**<u>Subproblem 2</u> (3 pts):** Show that this minimzation problem with nuclear norm is equvalent to minimizing defined earlier objective $\mathcal{L}(U, V)$. How are these setups related to low-rank of $X$? What can you say about the convexity of the objective $\mathcal{L}(U, V)$?

**Hint:** For simplicity, assume $\lambda_U = \lambda_V = \mu$.

**<u>Solution to Subproblem 2</u>:**



---


## **Part 2: Gradient Descent (16 pts)**

### **2.1 Gradient Evaluation**

Directly solving for $U$ and $V$ through matrix inversion is computationally impractical. Instead, we use **Gradient Descent (GD)** for iterative optimization.

**<u>Subproblem 3</u> (3 pts):** Derive the gradients of $\mathcal{L}(U, V)$ with respect to $U$ and $V$.


### **2.2 Steepest Gradient Descent and Step Size Calculation**

In **Steepest Gradient Descent**, we optimize the step size dynamically for efficient convergence. The optimal step size $\alpha_U$ for $U$ minimizes the objective along the gradient direction:

$$
\alpha_U = \arg\min_{\alpha \geq 0} \mathcal{L}(U - \alpha \nabla_U \mathcal{L}, V).
$$

**<u>Subproblem 4</u> (10 pts):** Derive optimal step sizes $\alpha_U$ and $\beta_V$ for updates to $U$ and $V$. These can significantly speed up convergence compared to fixed step sizes.

### **2.3 Alternating Optimization and Complexity Analysis**

In **Alternating Optimization**, we sequentially update $U$ and $V$:

$$
\begin{aligned}
    U_{k+1} &= U_k - \alpha_k \nabla_U \mathcal{L}(U_k, V_k), \\
    V_{k+1} &= V_k - \beta_k \nabla_V \mathcal{L}(U_{k+1}, V_k).
\end{aligned}
$$

**<u>Subproblem 5</u> (3 pts):** Estimate the computational complexity per iteration in terms of $|\Omega|$ (observed entries) and $K$ (rank estimate). Identify opportunities to reuse computations from the $U$ update in the $V$ update to improve efficiency.

**<u>Solution to Subproblem 3</u>:**



---




## **Part 3: Implementation and Experiments (20 pts)**

To validate our method, we use the relative error on validation and training masks, defined as:

$$
\mathrm{RelError} = \frac{\|\mathcal{P}_{\Omega_{\text{val}}} \odot (X - \hat{X})\|_F}{\| \mathcal{P}_{\Omega_{\text{val}}} \odot X \|_F }
$$

where $X$ is the original matrix, $\hat{X}$ the completion, and $\mathcal{P}_{\Omega_{\text{val}}}$ the validation mask of known elements. The provided function `train_val_masks` splits known elements for you. This error metric is not image-specific but serves our purpose here.


In [None]:
def train_val_masks(X, frac: float=0.8, is_image: bool=False, random_state=0) -> tuple:
    """Generate training and validation masks for observed data."""
    rng = np.random.default_rng(random_state)
    observed_target = np.nan_to_num(X) if not is_image else np.copy(X)
    mask = (observed_target != 0)
    train_mask = ((rng.random(observed_target.shape) < frac) & mask).astype(int)
    val_mask = (~train_mask & mask).astype(int)
    return train_mask, val_mask, observed_target

In [None]:
class DataHandler:
    """Handles image and synthetic data for matrix completion tasks."""
    def __init__(self, shape=None, rank=None, target_img_path=None, observed_img_path=None, random_state=42):
        self.rng = np.random.default_rng(random_state)
        if shape and rank:
            self.target = self._generate_synthetic_data(shape, rank)
            self._observed_target = self.target.copy()
            self.is_image = False
        elif target_img_path and observed_img_path:
            self.target = np.array(Image.open(target_img_path)).astype('int')
            self._observed_target = np.array(Image.open(observed_img_path)).astype('int')
            self.is_image = True
        else:
            raise ValueError("Provide either shape and rank for synthetic data or paths for images.")
        self.N, self.M = self.target.shape

    @property
    def observed_target(self):
        """Read-only access to the observed target matrix."""
        return self._observed_target

    def _generate_synthetic_data(self, shape, rank):
        """Generate a low-rank matrix for synthetic data."""
        U, V = self.rng.standard_normal((shape[0], rank)), self.rng.standard_normal((shape[1], rank))
        return U @ V.T

    def apply_missing_entries(self, missed_mode='uniform', frac=0.8, window_shape=None):
        """Apply missing entries to observed_target."""
        assert missed_mode in {'uniform', 'window'}, f"Invalid missed_mode '{missed_mode}'"
        if missed_mode == 'uniform':
            R = self.rng.choice(self.target.size, int(self.target.size * frac), replace=False)
            self._observed_target.ravel()[R] = np.nan
        elif missed_mode == 'window' and window_shape:
            n, m = window_shape
            if (n > self.N) or (m > self.M):
                raise ValueError("Submatrix shape is larger than the matrix dimensions.")
            row, col = self.rng.integers(0, self.N - n + 1), self.rng.integers(0, self.M - m + 1)
            self._observed_target[row:row + n, col:col + m] = np.nan
        return self._observed_target


    def visualize(self, scale=0.5):
        """Display the target and observed matrices/images side by side."""
        plt.figure(figsize=(15, 4))
        grid = plt.GridSpec(1, 2, wspace=0.4)
        for i, (data, title) in enumerate(zip([self.target, self._observed_target], ['Target', 'Observed'])):
            ax = plt.subplot(grid[0, i])
            sns.heatmap(data, ax=ax, xticklabels=int(self.M * scale), yticklabels=int(self.N * scale))
            ax.set_title(f'{title} Image') if self.is_image else ax.set_title(f'{title} Matrix')
        plt.show()

### **3.1 Synthetic Data Validation**

**<u>Subproblem 6</u>: (10 pts)**
- Implement the functions `grad_U` and `grad_V` for computing gradients, `forward` for objective evaluation, `step` method, `relative_err` and `loss` for computing the relative error and objective value respectively.
- Run the method on synthetic data `obs_win` and `obs_uni` (they are declared below).
- Plot the loss and relative error for each dataset (`obs_win` and `obs_uni`). Compare Gradient Descent (GD) with Steepest Gradient Descent (Steepest GD) and report your findings.

**Hint:** If using Python 3.5+ and NumPy 1.10+, you can use the `@` operator to simplify matrix expressions.

In [None]:
class MatrixCompletion:
    """Performs matrix completion using gradient-based optimization."""
    def __init__(self, shape, rank, lambd=1e6, lambda_U=1.0, lambda_V=1.0,
                 lr_U=0.001, lr_V=0.001, steepest_descent=False, random_state=0):
        self.rank = rank
        self.lambd_U, self.lambd_V = lambda_U / lambd, lambda_V / lambd
        self.lr_U, self.lr_V = lr_U, lr_V
        self.steepest_descent = steepest_descent
        self.rng = np.random.default_rng(random_state)
        self.U = self.rng.normal(0.0, 1.0 / lambda_U, size=(shape[0], rank))
        self.V = self.rng.normal(0.0, 1.0 / lambda_V, size=(shape[1], rank))

    def forward(self):
        """Compute the matrix approximation U @ V.T."""
        return ### YOUR CODE HERE

    def step(self, X_hat, X, mask, epoch):
        """Single optimization step using gradient descent."""
        self.masked_residual = mask * (X - X_hat)
        if self.steepest_descent:
            self.lr_U = self._lr_steepestGD('U', mask, X)
        # Update U factor
        ### YOUR CODE HERE

        X_hat = self.forward()
        self.masked_residual = mask * (X - X_hat)
        if self.steepest_descent:
            self.lr_V = self._lr_steepestGD('V', mask, X)
        # Update V factor
        ### YOUR CODE HERE

        if epoch % 20 == 0:
            print(f"Epoch {epoch}: Train Loss={self.loss(X_hat, X, mask):.4f}")

    def loss(self, X_hat, X, mask):
        """Compute the loss for the masked matrix."""
        return ### YOUR CODE HERE

    def relative_err(self, X_hat : np.ndarray, X: np.ndarray, mask: np.ndarray) -> float:
        return ### YOUR CODE HERE

    def _lr_steepestGD(self, param: str, mask: np.ndarray, X) -> float:
        """Compute the learning rate for steepest descent."""
        if param == 'U':
            return ### YOUR CODE HERE
        else:
            return ### YOUR CODE HERE

    @property
    def grad_U(self):
        return ### YOUR CODE HERE

    @property
    def grad_V(self):
        return ### YOUR CODE HERE

    def show(self, scale: float=0.5, cmap: str=None):
        """Visualize the output matrix after completion."""
        X_out = self.forward()
        plt.figure(figsize=(10.0, 5.0))
        sns.heatmap(X_out, xticklabels=int(X_out.shape[0] * scale),
                    yticklabels=int(X_out.shape[1] * scale), cmap=cmap)
        title = "Completed Matrix" if isinstance(self, MatrixCompletion) else "Completed Matrix with Side Information"
        plt.title(title)
        plt.show()

In [None]:
def trainer(model, X, train_mask, val_mask, max_epochs):
    """Train the matrix completion model and track loss/error over epochs."""
    train_losses, val_losses, train_errs, val_errs = [], [], [], []
    for epoch in range(max_epochs):
        X_hat = model.forward()
        train_loss, train_err = model.loss(X_hat, X, train_mask), model.relative_err(X_hat, X, train_mask)
        train_losses.append(train_loss), train_errs.append(train_err)
        model.step(X_hat, X, train_mask, epoch)
        if val_mask is not None:
            val_loss, val_err = model.loss(X_hat, X, val_mask), model.relative_err(X_hat, X, val_mask)
            val_losses.append(val_loss), val_errs.append(val_err)
    return train_losses, val_losses, train_errs, val_errs

In [None]:
# Windowed missing entries
matrix_win = DataHandler(shape=(300, 300), rank=5)
obs_win = matrix_win.apply_missing_entries(missed_mode='window', window_shape=(50, 50))
matrix_win.visualize()

mask_train_win, mask_val_win, data_win = train_val_masks(obs_win, frac=0.8)

model_win = MatrixCompletion(shape=data_win.shape, rank=5, lambd=1e6, steepest_descent=True)

In [None]:
train_losses, val_losses, train_errs, val_errs = trainer(model_win, data_win, mask_train_win, mask_val_win, max_epochs=200)

In [None]:
plt.semilogy(train_errs, label='Train')
plt.semilogy(val_errs, label='Validation')
plt.legend();

In [None]:
# Uniform missing entries
matrix_uni = DataHandler(shape=(300, 300), rank=5)
obs_uni = matrix_uni.apply_missing_entries(missed_mode='uniform', frac=0.8)
matrix_uni.visualize()

mask_train_uni, mask_val_uni, data_uni = train_val_masks(obs_uni, frac=0.8)

model_uni = MatrixCompletion(shape=data_uni.shape, rank=5, lambd=1e6, steepest_descent=True)

In [None]:
%%time
train_losses, val_losses, train_errs, val_errs = trainer(model_uni, data_uni, mask_train_uni, mask_val_uni, max_epochs=200)

In [None]:
plt.semilogy(train_errs, label='Train')
plt.semilogy(val_errs, label='Validation')
plt.legend();

### **3.2 Real Data Completion: Cropped Window**

**<u>Subproblem 7</u> (5 pts):**
- Discuss how to estimate the rank when only part of an image is available.
- Test the implementation on `fields_observed.png`.

**Hint:** Addressing the question with the rank recall the lecture and seminar on the this topic. There are two straightforward approaches to consider.

In [None]:
### YOUR CODE FOR ESTIMATING THE RANK OF 'fields_observed.png' BASED ON OBSERVED ENTRIES

In [None]:
image_fld = DataHandler(target_img_path='./data/fields.png', observed_img_path='./data/fields_observed.png')
obs_fld = image_fld.observed_target
image_fld.visualize()

mask_train_fld, mask_val_fld, data_fld = train_val_masks(obs_fld, frac=0.8)

model_fld = MatrixCompletion(shape=data_fld.shape, rank=50, lambd=1e6, steepest_descent=True)

In [None]:
train_losses, val_losses, train_errs, val_errs = trainer(model_fld, data_fld, mask_train_fld, mask_val_fld, max_epochs=200)

In [None]:
plt.semilogy(train_errs, label='Train')
plt.semilogy(val_errs, label='Validation')
plt.legend();

In [None]:
model_fld.show()

### **3.3 Real Data Completion: Uniform Noise**

**<u>Subproblem 8</u> (5 pts):** Run the method on `peppers_observed_num_percent.png` images for $\text{num}=\{50, 60, 70, 80, 90\}$, where each image has a different percentage of randomly missing pixels. Estimate the rank as done for the field image.

In [None]:
### YOUR CODE FOR ESTIMATING THE RANK OF 'peppers_observed_num_percent.png' BASED ON OBSERVED ENTRIES


In [None]:
image_pep = DataHandler(target_img_path='./data/pepper.png', observed_img_path='./data/peppers_observed_70_percent.png')
obs_pep = image_pep.observed_target
image_pep.visualize()

mask_train_pep, mask_val_pep, data_pep = train_val_masks(obs_pep, frac=0.8)

model_pep = MatrixCompletion(shape=data_pep.shape, rank=50, lambd=1e6, steepest_descent=True)

In [None]:
train_losses, val_losses, train_errs, val_errs = trainer(model_pep, data_pep, mask_train_pep, mask_val_pep, max_epochs=200)

Epoch 0: Train Loss=78968424.8359
Epoch 20: Train Loss=1965178.4635
Epoch 40: Train Loss=935021.5541
Epoch 60: Train Loss=599446.7286
Epoch 80: Train Loss=434834.2247
Epoch 100: Train Loss=337948.3340
Epoch 120: Train Loss=274882.1989
Epoch 140: Train Loss=231080.5911
Epoch 160: Train Loss=199129.2769
Epoch 180: Train Loss=174878.8973


In [None]:
plt.semilogy(train_errs, label='Train')
plt.semilogy(val_errs, label='Validation')
plt.legend();

In [None]:
model_pep.show()

## **Part 4: Bonus - Matrix Completion with Side Information (18 pts)**

### **4.1 Factorization with Side Information**

The effectiveness of the method can sometimes be enhanced by incorporating prior knowledge about the problem. Suppose that we are given that the rows and columns of matrix $X$ lie in lower-dimensional spaces, meaning there exist full-rank matrices
$$
    G = \begin{pmatrix} \boldsymbol{g}_1^T \\ \vdots \\ \boldsymbol{g}_{N}^T \end{pmatrix} \in \mathbb{R}^{N \times n}, \quad H = \begin{pmatrix} \boldsymbol{h}_1^T \\ \vdots \\ \boldsymbol{h}_{M}^T \end{pmatrix} \in \mathbb{R}^{M \times m},
$$
where  $n < N$ and $m < M$, such that
\begin{align*}
    \mathrm{col}(X) &\subseteq \mathrm{col}(G) \\
    \mathrm{col}(X^T) &\subseteq \mathrm{col}(H).
\end{align*}

Under this assumption, one can express $X$ as:

$$
    X = G U V^T H^T,
$$

where $U \in \mathbb{R}^{n \times K}$ and $V \in \mathbb{R}^{m \times K}$ with $\text{rank}(X) \leq K$. We model this as:

$$
    \begin{aligned}
        p(X_\Omega | U, V, \lambda, \lambda_U, \lambda_V) &= \prod\limits_{(i, j) \in \Omega} \mathcal{N}(x_{i j} | \boldsymbol{g}_{i}^T U V^T \boldsymbol{h}_{j}, \lambda^{-1}) \\
        p(U | \lambda_U) &= \prod\limits_{l = 1}^{n} \mathcal{N}(\boldsymbol{u}_{l}^T | 0, \lambda_U^{-1} I) \\
        p(V | \lambda_V) &= \prod\limits_{s = 1}^{m} \mathcal{N}(\boldsymbol{v}_{s}^T | 0, \lambda_V^{-1} I).
    \end{aligned}
$$

Then, under the new assumptions, the posterior distribution is
$$
    p(U, V | X_\Omega, \lambda, \lambda_U, \lambda_V) \propto p(X_\Omega | U, V, \lambda
 \, p(U | \lambda_U) \, p(V | \lambda_V).
$$
In matrix form:
$$
    \mathcal{L}_{\text{SI}}(U, V) = \frac{1}{2} \left\{ \lambda \| \mathcal{P}_\Omega(X) - \mathcal{P}_\Omega(G U V^T H^T) \|_F^2 + \lambda_U \| U \|_F^2 + \lambda_V \| V \|_F^2 \right\}.
$$

We seek the MAP estimate:
$$
    \mathcal{L}_{\text{SI}}(U, V) \rightarrow \min_{U, V} \quad \text{s.t.} \quad U \in \mathbb{R}^{n \times K}, \, V \in \mathbb{R}^{m \times K}.
$$


We aim to minimize $\mathcal{L}_{\text{SI}}(U, V)$ over $U$ and $V$ using similar gradient descent updates with the matrices $G$ and $H$.

In [None]:
def load_side_info(si_dic: str = './data/', img_name: str = 'fields'):
    """Load precomputed matrices G and H for side information."""
    return (
        np.load(f'{si_dic}G_matrix_{img_name}.npy'),
        np.load(f'{si_dic}H_matrix_{img_name}.npy')
    )

**<u>Subproblem 9</u> (3 pts):**
Compute gradients of $\mathcal{L}_{\text{SI}}$ with respect to $U$ and $V$.

**<u>Solution to Subproblem 9</u>:**



---


The steepest descent for each factor can also be employed here.

**<u>Subproblem 10</u> (5 pts):**
Derive the step sizes $\alpha_U$ and $\beta_V$ for $U$ and $V$ in steepest descent.

**<u>Solution to Subproblem 10</u>:**



---


**<u>Subproblem 11</u> (10 pts):**
Repeat experiments on real images now applying the method with side information.

In [None]:
class MatrixCompletionSI(MatrixCompletion):
    """Matrix Completion with Side Information Matrices G and H."""
    def __init__(self, G, H, rank, **kwargs):
        shape = (G.shape[1], H.shape[1])
        super().__init__(shape=shape, rank=rank, **kwargs)
        self.G, self.H = G, H

    def forward(self):
        """Compute matrix approximation with side information."""
        return ### YOUR CODE HERE

    @property
    def grad_U(self):
        return ### YOUR CODE HERE

    @property
    def grad_V(self):
        return ### YOUR CODE HERE

    def _lr_steepestGD(self, param: str, mask: np.ndarray, X) -> float:
        if param == 'U':
            return ### YOUR CODE HERE
        else:
            return ### YOUR CODE HERE

In [None]:
# Image: fields

G_fld, H_fld = load_side_info(img_name='fields')

mask_train_fld, mask_val_fld, data_fld = train_val_masks(obs_fld, frac=0.8)

model_fld_si = MatrixCompletionSI(G_fld, H_fld, rank=r, lambd=1e6, steepest_descent=True)

In [None]:
train_losses_si, val_losses_si, train_errs_si, val_errs_si = trainer(model_fld_si, data_fld, mask_train_fld, mask_val_fld, max_epochs=200)

In [None]:
plt.semilogy(train_errs_si, label='Train')
plt.semilogy(val_errs_si, label='Validation')
plt.legend();

In [None]:
model_fld_si.show()

In [None]:
# Image: pepper

G_pep, H_pep = load_side_info(img_name='pepper')

mask_train_pep_si, mask_val_pep_si, data_pep = train_val_masks(obs_pep, frac=0.8)

model_pep_si = MatrixCompletionSI(G_pep, H_pep, rank=r, lambd=1e6, steepest_descent=True)

In [None]:
train_losses_si, val_losses_si, train_errs_si, val_errs_si = trainer(model_pep_si, data_pep, mask_train_pep_si, mask_val_pep_si, max_epochs=200)

Epoch 0: Train Loss=70450474.6590
Epoch 20: Train Loss=1096290.7651
Epoch 40: Train Loss=708811.5183
Epoch 60: Train Loss=597904.8178
Epoch 80: Train Loss=545734.1375
Epoch 100: Train Loss=514562.4336
Epoch 120: Train Loss=493036.6314
Epoch 140: Train Loss=476694.5734
Epoch 160: Train Loss=463518.7184
Epoch 180: Train Loss=452486.2511


In [None]:
plt.semilogy(train_errs_si, label='Train')
plt.semilogy(val_errs_si, label='Validation')
plt.legend();

In [None]:
model_pep_si.show()