# Math 266 Python Exercise 3. Matrices

Instructions:  Read and execute the cells below and follow the instructions.

Execute the cell below to load necessary modules

In [1]:
! pip install wget
import numpy as np
import wget
import pandas as pd
import matplotlib.pyplot as plt
from numpy.linalg import norm
import seaborn as sns
sns.set()

[0m

## Matrix Applications
    
    Recall that in our last exercise we investigated the use of a matrix to store movie ratings data. In the image below we see a partail display of a ratings matrix. Note that User-0 gave movie-0 a rating of 5.  A zero indicates that the user did not rate the movie. For example User-1 did not rate Movies 1-8.

![ratings](https://github.com/rmartin977/Math-266/blob/main/ratings_matrix.png?raw=1)

Execute the following cell to load the ratings matrix that we will be working with in this exercise.

In [2]:
file_1 = wget.download("https://github.com/rmartin977/Math-266/blob/main/ratings_matrix.npy?raw=true")
ratings = np.load(file_1)

100% [......................................................] 1586254 / 1586254

In [3]:
ratings.shape  # notice there are 1682 movies and 943 users

(1682, 943)

In [4]:
rdf = pd.DataFrame(ratings) # below we get a view the first 5 rows and 15 columns of our ratings matrix. It is not necessary to understand the code that generates  this output.
rdf.iloc[:5,:15] # User ID-12 gave Movie ID-4 1 star.

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,5,4,0,0,4,4,0,0,0,4,0,0,3,0,1
1,3,0,0,0,3,0,0,0,0,0,0,0,3,0,0
2,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,3,0,0,0,0,0,5,0,0,4,0,5,5,0,0
4,3,0,0,0,0,0,0,0,0,0,0,0,1,0,0


In [5]:
file_2 = wget.download("https://github.com/rmartin977/Math-266/blob/main/dictionary.npy?raw=true")

100% [........................................................] 140889 / 140889

In [6]:
titles = np.load(file_2,allow_pickle=True).item()

In [7]:
titles[0]# Note the ID-0 corresponds to "Toy Story"

'Toy Story (1995)'

How to find a rating.  To see what rating with a given user gave a given movie just enter:  ratings(movie_id,user_id)

In [8]:
ratings[0,22] # Note that user #22 gave 'Toy Story' 5 stars.

5

## The cell below will compute the total score for each movie and store the result in the array "scores". Next the top five scores are determined and the corresponding movies are printed out.

## Modify the cell so the top 10 movies are printed out. This will require just one change below.

In [9]:
scores = ratings.sum(axis=1)
max_5 = np.argsort(scores)[-5:][::-1]
for i,id in enumerate(max_5):
    print(i+1,titles[id])

1 Star Wars (1977)
2 Fargo (1996)
3 Return of the Jedi (1983)
4 Contact (1997)
5 Raiders of the Lost Ark (1981)


---
---
# Movie Recommender
---
---
In this part of the exercise we will implement a simple movie recommender system using
cosine similarity. Recall the cosine of the angle between two vectors is given by:

$$cos(\theta) =\frac{\overrightarrow{a}\cdot\overrightarrow{b}}{\overrightarrow{||a||}\;\overrightarrow{||b||}}$$

If two vectors are normalized (divide the vector by its magnitude) then the cosine similarity between the two vectors
is computed by simply computing the dot product of the vectors.  This dot product returns
a scalar that will give us a measure of how similar the two vectors are. We thus can determine
how similar two movies are by forming the dot product of the normed feature vectors for each 
movie. The feature vectors are the rows of the ratings_matrix.


Let us compare the movies Toy_Story and Full_Metal_Jacket.  These movies should *not* be similar.


In [10]:
# first normalize the vector for each movie.  Note Full Metal Jacke has movie id #187

toy_story_norm = ratings[0]/norm(ratings[0])
Full_Metal_Jacket_norm = ratings[187]/norm(ratings[187])


In [11]:
norm(toy_story_norm) # confirm the vector is normalized

1.0

In [12]:
# now compute the similarity between the two movies and display the result.

similarity_1 = np.dot(toy_story_norm,Full_Metal_Jacket_norm)

print(f'The similarity measure between Toy Story and Full Metal Jacket is {similarity_1}')

The similarity measure between Toy Story and Full Metal Jacket is 0.41731579949389364


In [13]:
#  Now we compare the movies Star Wars #49  and Return of the Jedi #180.  These movies should be similar.

Star_Wars_norm = ratings[49]/norm(ratings[49])
Return_of_the_Jedi_norm = ratings[180]/norm(ratings[180])

# now compute the similarity between the two movies and display the result.
similarity_2 = np.dot(Star_Wars_norm,Return_of_the_Jedi_norm)

print(f'\nThe similarity measure between Star Wars and Return of the Jedi is {similarity_2}')


The similarity measure between Star Wars and Return of the Jedi is 0.8844757466059625


## These results show that cosine similarity works.  Cool!

Now we write a python function called top_five that will take as an input the id# 
for a movie that a user likes and return a list of the 5 movies we recommend to the user.
These will be the five movies most "similar" to the input.

For example when the function is executed with with Titanic #312 as input the output
generated is below. Of course the first movie in the list should be the movie itself.

1. Titanic (1997)
2. Good Will Hunting (1997)
3. Contact (1997)
4. Apt Pupil (1998)
5. Tomorrow Never Dies (1997)

First a little bit about python functions. A function is simply a block of code that runs when you call it.  Below is an example of a simple function that takes a input number x and returns the square of the input. Note the function definition begins with keyword "def" then the name of the function and the list of inputs.

```Python
def square(x):
    return x**2
```

To "call" the function you simple type:

square(3)

and 9 will be returned.  For practice enter the above function in a code cell below.  Then in a new cell call the function with any input.

In [14]:
# Enter you code here



The cell below contains the function defintion for top_five.  You do not need to understand all the code at this time to complete this assignment.

In [15]:
def top_five(m,i):

    '''
    This program will determine the 5 rows in matrix m
    that are most similar to the target row i.

    Cosine similartiy is used in determining similarity
    normalize the rows of matrix m
 
 '''
    # this line will normalize all the rows in the matrix

    m = m/norm(m,axis=1).reshape(-1,1) 
    
    # this line will compute the similarity between the target movie and all the other movies
    # the result is stored in the array similarities

    similarities = np.dot(m,m[i]) 
    
    # Next we sort the indices and print out the top five.
    
    
    l =np.argsort(similarities)[::-1][:5]
    for i,k in enumerate(l):
        print(i+1,titles[k])

Now let us test our recommender with the movie "Star Wars" id #49, The results look quite reasonable.

In [16]:
top_five(ratings,88)

1 Blade Runner (1982)
2 Raiders of the Lost Ark (1981)
3 Alien (1979)
4 Empire Strikes Back, The (1980)
5 Aliens (1986)


Hope you agree that all the movies above are science fiction and similar to "Star Wars".  Futher hope you are amazed as I am that we can use math to determine how similar movies are!

## Your turn...
Answer the following 5 questions.  Go to gradescope python exericse 2. and enter your answers.

1. What rating did user 25 give to the movie "Star Wars"?
2. What is the title of the movie with ID-143?
3. What movie is most similar to "Blade Runner" ID-88?
4. Determine the similarity score between "Top Gun" ID-160 and "On Golden Pond" ID-161.  Express your answer rounded to 2 decimal points.  Should this score be low?
5. Run top_five on the movie "Room with A View" ID-212. What is the title of the 5th movie returned on the list?
6. What movie comes in 6th place with the top overall score. Note "Raiders of the Lost Arc" comes in fifth. See cell[9] above.

Below is a handy function that helps you find the ID for a given movie.  Just enter the title or a word in the title and an Id will be returned if a match is found.  For example when we use search word "Born" the movie Id 52 is returned. When we check we see that is the ID for "Natural Born Killers".

In [17]:
# function to return Id for a given movie
def get_Id(title):
    for Id, val in titles.items():
        if title in val:
            return Id
    return "movie doesn't exist"

In [18]:
get_Id("Born")

52

In [19]:
titles[52]

'Natural Born Killers (1994)'