# SVD $R = P \cdot Q^T$ via gradient descent

In [1]:
import numpy as np

We assume that we want to obtain the following factorization
\begin{equation}
    \widehat{r}_{x,i} = q_i \cdot p_x = \sum_{f \in F} q_{if} \cdot p_{fx}
\end{equation}
where $f \in F$ are hidden factors.
Our goal is to find the matrix $P$ and $Q$ such that
\begin{equation}
  \min_{P,Q} \sum_{(i, x) \in R} (\widehat{r}_{x,i} - q_i \cdot p_x)^2 + \left[ \lambda_1 \sum_{x} \|p_x\|^2 + \lambda_2 \sum_{i} \|q_i\|^2 \right].
\end{equation}

Gradient descent: Initialize the matrix $P$ and $Q$ (set the missing values in $R$ to 0).

Repeat:

  * $P = P - \alpha \cdot \nabla P$,
      * where $\nabla P$ is the gradient / derivative of the matrix $P$: $\nabla P = [\nabla p_{j, x}]$ and $\nabla p_{j,x} = \sum_{i, x}\big( -2  (r_{x, i} - q_i \cdot p_x) q_{i, j} + 2 \lambda_1 p_{j, x}\big)$, and
  * $Q = Q - \alpha \cdot \nabla Q$,
      * where $\nabla Q$ is the gradient / derivative of the matrix $Q$: $\nabla Q = [\nabla q_{i, j}]$ and $\nabla q_{i, j} = \sum_{i, x}\big( -2  (r_{x, i} - q_i \cdot p_x) p_{j, x} + 2 \lambda_2 q_{i, j}\big)$.

Gradient Descent (GD) vs. Stochastic GD:

  * $\nabla q_{i,j} = \sum_{x, i} \nabla Q(r_{x,i})$.
  * $Q = Q - \alpha \cdot \nabla Q = Q - \alpha \cdot \left( \sum_{x, i} \nabla Q(r_{x,i}) \right)$,
  * GD: $Q = Q - \alpha \cdot \left( \sum_{x, i} \nabla Q(r_{x,i}) \right)$,
  * SGD: $Q = Q - \alpha \cdot \nabla Q(r_{x,i})$, Instead of calculating the gradient for all directions, we do each step separately.