# Our Movie-Recommendation algorithm

Recall from our last lecture that we have 
$$
A = \begin{bmatrix} | & | & \cdots & | \\ \tilde u_1 & \tilde u_2 & \cdots & \tilde u_n \\
| & | & \cdots & |\end{bmatrix} \begin{bmatrix}\rule{1em}{1pt} &  v_1^T & \rule{1em}{1pt} \\ \rule{1em}{1pt} &  v_2^T & \rule{1em}{1pt} \\ & \vdots & \\ \rule{1em}{1pt} &  v_n^T & \rule{1em}{1pt}\end{bmatrix}.
$$
Notice that we have $\tilde U$ having the columns and $V^T$ having the rows. Compare this to our data matrix, $A$, 
$$
A = \begin{pmatrix}
3 & 4 & 2 & 1 \\
5 & 1 & 3 & 1\\
1 & 1 & 2 & 4\\
3 & 3 & 3 & 3\\
2 & 1 & 4 & 4
\end{pmatrix},
$$
where the rows are people's scores and the columns are movie ratings. So **the left-singular vectors**, which are columns, **form a basis for the movies** and **the right-singular vectors**, which are rows, **form a basis for the people.**

So for example, we might find that
$$ 
\begin{align*}
\tilde u_1 &= \text{a typical action movie}\\
\tilde u_2 &= \text{a typical comedy movie}\\
\tilde u_3 &= \text{a typical horror movie}\\
\tilde u_4 &= \text{a typical anime movie}\\
&\vdots
\end{align*},
$$
etc. So just like before when I wrote, 
$$
\begin{pmatrix} 6 \\ 4 \end{pmatrix} = 6 \begin{pmatrix} 1 \\0 \end{pmatrix} + 4 \begin{pmatrix} 0 \\ 1 \end{pmatrix},
$$
writing the vector as a sum of its two basis elements, now we can have something like
$$ \text{The Batman} = \alpha_1 \tilde u_1 + \alpha_2 \tilde u_2 + \ldots,$$
where $\alpha_1$ tells me "*how much of an action movie The Batman is*, $\alpha_2$ tells me "*how much of a comedy movie The Batman is*", etc. Then *every single movie will be some combination of those different types.* Now we don't actually get to choose those genres/basis. Instead, if we plug in $4$ movies, the SVD will find a 4-dimensional basis for us: it will find 4 characteristics. If we plug in 1000 movies it will find a $1000$-dimensional basis: 1000 characteristics. We won't be able to actually say in words what those characteristics are, that's what SVD does on its own and it's hard to interpret what the basis elements are, but what it's doing is *finding some interrelatedness between movies*, based on how they were rated by people. For example, it could actually be that one of the basis elements is *year the movie came out* because certain groups of people may rank movies similarly based on the year, or *whether or not the movie had Leonardo Dicaprio in it*, etc. **We don't know what that basis is.**

What about the right-singular vectors? Well, they provide a basis for our other space: people. For example, we may have something like
$$
\begin{align*}
v_1 &= \text{ typical action fan}\\
v_2 &= \text{ typical comedy fan}\\
v_3 &= \text{ typical horror fan}\\
v_4 &= \text{ typical anime fan}\\
& \vdots
\end{align*}
$$


etc. So now we have something like


$$
A = \tilde U V^T = \begin{bmatrix} | & | & \cdots & \\
\text{action movie} & \text{comedy movie} & \cdots & \\ | & | & \cdots & \end{bmatrix} \begin{bmatrix} \rule{1em}{1pt} & \text{action fan} & \rule{1em}{1pt}\\
\rule{1em}{1pt} & \text{comedy fan} & \rule{1em}{1pt} \\ 
\vdots & \vdots & \vdots \end{bmatrix}
$$


So if we want to know Adnan's score for "The Batman", which is $A(1,1)$ we do
$$
\begin{align*}
A(1,1) &= \text{Adnan's score for "The Batman"}\\
&= (\text{How much of an action movie is The Batman})\\
&\qquad \qquad \cdot(\text{How much does Adnan like action movies}) \\
& \qquad + (\text{How much of a comedy movie is The Batman})\\
& \qquad \qquad \cdot(\text{How much does Adnan like comedy movies}) + \ldots
\end{align*}
$$
etc.

## Our data
We don't have all of the data yet, but I'll be looking at some preliminary data today. Before we do so, I want to change one thing I said before. In the example above we had 
$$
A = \begin{pmatrix}
3 & 4 & 2 & 1 \\
5 & 1 & 3 & 1\\
1 & 1 & 2 & 4\\
3 & 3 & 3 & 3\\
2 & 1 & 4 & 4
\end{pmatrix},
$$
but that's not really what our data is going to look like because not every person will have seen every movie. Instead, we will actually have something like
$$
A = \begin{pmatrix}
3 & \rule{1em}{.1em} & 2 & \rule{1em}{.1em} \\
\rule{1em}{.1em} & 1 & 3 & \rule{1em}{.1em}\\
1 & 1 & 2 & 4\\
3 & 3 & \rule{1em}{.1em} & 3\\
\rule{1em}{.1em} & 1 & 4 & 4
\end{pmatrix},
$$
where the blanks mean the person hasn't seen the score. Our goal is to fill in those missing entries. This problem is called **matrix completion**, and we will be making a *matrix-completion algorithm*. 

We are going to discuss a couple naive ways that we can solve this problem before talking about our actual algorithm.

#### Option 1 - column average (average score for movie)
First, we can fill in the missing entries with the average score for that movie. This would lead to
$$
A = \begin{pmatrix}
3 & \underline{1.5} & 2 & \underline{3.7} \\
\underline{2.3} & 1 & 3 & \underline{3.7}\\
1 & 1 & 2 & 4\\
3 & 3 & \underline{2.75} & 3\\
\underline{2.3} & 1 & 4 & 4
\end{pmatrix}.
$$

#### Option 2 - row average (average score for person)
First, we can fill in the missing entries with the average score for that movie. This would lead to
$$
A = \begin{pmatrix}
3 & \underline{2.5} & 2 & \underline{2.5} \\
\underline{2.0} & 1 & 3 & \underline{2.0}\\
1 & 1 & 2 & 4\\
3 & 3 & \underline{3.0} & 3\\
\underline{3.0} & 1 & 4 & 4
\end{pmatrix}.
$$

We know that either of these choices won't be the best. We are only using a very small part of the data when we do this. We are not using other trends. We want to use whether or not a movie is highly rated, but we also want to take into account other connections in the data. **A low-rank approximation of the data will do this for us.**

But we obviously can't just take a low-rank approximation of the matrix with missing values. Python needs numbers to get an SVD, so we will need to do some work to get there.

Before we walk through and describe the algorithm, I want to show you the data. It's always a good idea to look at your data just to get comfortable with what you have. I am going to do this using the python script `MovieRatingInfo.py`.

Now that we have seen a little bit about what the data looks like, let's think about the algorithm we are going to use.

The first thing we need to do is notice that we have a bunch of NaNs that we need to deal with. So what we are going to do is actually initialize the matrix using the row average. In other words, we first guess that the person will rate the movies they haven't seen exactly equal to the average of all movies that they have rated. So we will have
$$
A = \begin{pmatrix}
3 & \underline{2.5} & 2 & \underline{2.5} \\
\underline{2.0} & 1 & 3 & \underline{2.0}\\
1 & 1 & 2 & 4\\
3 & 3 & \underline{3.0} & 3\\
\underline{3.0} & 1 & 4 & 4
\end{pmatrix}.
$$
Then, we will *subtract the row mean* from each row. This *normalizes* the data because some people just rate movies higher. We end up at 
$$
A = \begin{pmatrix}
0.5 & 0 & -0.5 & 0 \\
0 & -1 & 1 & 0\\
-1 & -1 & 0 & 2\\
0 & 0 & 0 & 0\\
0 & -2 & 1 & 1
\end{pmatrix}.
$$
Now this data tells us something about the variance of how people rate movies: if someone rates it above their mean it's good, if its below their mean it's bad.

Now we can do SVD! Here's the algorithm.
- Take the rank-1 approximation of the shifted $A$ defined above.
- Replace $A$ with its rank-1 approximation.
- Replace back the known ratings (we are only trying to replace the 0s, we don't want to replace the ratings that we actually know).
- Repeat until the ratings matrix stops changing (within tolerance).

Calculate the rank-1 approximation of $A$ below.

In [14]:
A = np.array([[0.5, 0, -0.5, 0], [0, -1, 1, 0], [-1, -1, 0, 2], [0, 0, 0, 0], [0, -2, 1, 1]])
U, S, Vt = np.linalg.svd(A, full_matrices = False)
S_mat = np.diag(S)
rank_1 = (U[:, 0:1]@S_mat[0:1, 0:1])@Vt[0:1, :]
print(rank_1)

[[ 0.05296453  0.18482971 -0.08274423 -0.15505002]
 [-0.20885798 -0.72884925  0.32628989  0.61141734]
 [-0.42766533 -1.49241867  0.66812324  1.25196075]
 [ 0.          0.          0.          0.        ]
 [-0.4741551  -1.65465349  0.74075223  1.38805637]]


Let's look at "MovieSVD_Test.py" where I implement the algorithm with 90% of the data. I remove 10% of the data so that I can setup the algorithm on the 90% and then test how good it is at predicting for the 10% of people I randomly removed. Once I know how good it is (the accuracy), I'll add back in the 10% of people I removed and we'll have the complete movie recommendation tool.