# Math 266 Python Exercise 1. Matrices

You are currently viewing a Jupyter notebook. Recall that a notebook contains text and code cells. This is a text cell. To execute or "run" a cell you can hover your mouse over the cell and press the run arrow on the left side of the cell or press shift enter.

When you exit Google Colab this notebook will not persist on your G drive.  So you need to save a copy in your Google Drive. To do this click the file tab above and select "save a copy in drive".  Once you have done this you can exit and then return, reload and edit the notebook further if you like.

Execute the following cell to load necessary modules

In [61]:
! pip install wget
import numpy as np
import wget
import pandas as pd




## Create a matrix using numpy

To create the matrix:  

$ \begin{bmatrix}1 & 2 \\3 & 4 \end{bmatrix} $

 we issue the following command.
```Python

mat = np.array([[1,2],[3,4]])

```

Execute the cell below that creates a 2X2 matrix and display it.  Try experimenting and  changing values in the matrix and rerun the cell.

In [5]:
mat = np.array([[1,2],[3,4]])
print(mat)

[[1 2]
 [3 4]]


In [8]:
mat.shape # this tells us the size or "shape" of the matrix

(2, 2)

In [9]:
3*mat # this command returns a scalar multiple of the matrix

array([[ 3,  6],
       [ 9, 12]])

In [10]:
print(mat) # note that the original matrix is not changed

[[1 2]
 [3 4]]


In [11]:
mat = 3 * mat # this command will change the original matix
print(mat)

[[ 3  6]
 [ 9 12]]


In [13]:
new_mat = np.arange(1,26).reshape(5,5)  # here we create a new 5X5 matrix
new_mat

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [55]:
df = pd.DataFrame(new_mat) # below we see the same matrix with rows and columns labeled in bold.
df

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,5
1,6,7,8,9,10
2,11,12,13,14,15
3,16,17,18,19,20
4,21,22,23,24,25


## Matrix indexing
In the above display we see python index values start at zero. To access the (i,j) entry the matrix call mat you enter:
```python
mat[i,j]

```
Again be aware that Python uses zero based indexing.  For example to access the entry in the first row and column of new_mat.  We issue the command below.  Experiment and change i,j values and rerun the cell and compare your results with the matrix above.


In [14]:
new_mat[0,0]

1

In [15]:
new_mat[1] # this returns the entire second row ~ experiment!

array([ 6,  7,  8,  9, 10])

In [16]:
new_mat[:,1] # this returns an entire column. However notice a column vector is not returned, just an array.


array([ 2,  7, 12, 17, 22])

In [17]:
new_mat[:,1].reshape(5,1) # here we convert the array to a column


array([[ 2],
       [ 7],
       [12],
       [17],
       [22]])

In [20]:
new_mat.sum(axis=0) # returns the sum of all the columns (axis = 0)

array([55, 60, 65, 70, 75])

In [21]:
new_mat.sum(axis=1) # returns the sum of all the rows (axis = 1), confirm the sum of elements in the first row is 15

array([ 15,  40,  65,  90, 115])

<h1 style="color red; > Matrix Applications </h1>

## Matrix Applications
    
    In our first application we will investigate the use of a matrix to store movie ratings data.  In the image below we see a matrix of ratings for movies.  The rows correspond to movies and the columns to users. In the image below we see a partial display of a ratings matrix. Note that User-0 gave movie-0 a rating of 5.  A zero indicates that the user did not rate the movie. For example User-1 did not rate Movies 1-8.

![ratings](https://github.com/rmartin977/Math-266/blob/main/ratings_matrix.png?raw=1)

Execute the following cell to load the ratings matrix that we will be working with in this exercise.

In [31]:
file_1 = wget.download("https://github.com/rmartin977/Math-266/blob/main/ratings_matrix.npy?raw=true")
ratings = np.load(file_1)

In [33]:
ratings.shape  # notice there are 1682 movies and 943 users

(1682, 943)

In [63]:
rdf = pd.DataFrame(ratings) # below we get a view the first 5 rows and 15 columns of our ratings matrix. It is not necessary to understand the code that generates  this output.
rdf.iloc[:5,:15] # User ID-12 gave Movie ID-4 1 star.

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,5,4,0,0,4,4,0,0,0,4,0,0,3,0,1
1,3,0,0,0,3,0,0,0,0,0,0,0,3,0,0
2,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,3,0,0,0,0,0,5,0,0,4,0,5,5,0,0
4,3,0,0,0,0,0,0,0,0,0,0,0,1,0,0


To convert movie ID's to titles we need to download a python dictionary.  Execute the following cell.

In [64]:
file_2 = wget.download("https://github.com/rmartin977/Math-266/blob/main/dictionary.npy?raw=true")

In [65]:
titles = np.load(file_2,allow_pickle=True).item()

## To determine the title for a given Movie ID just enter titles[ID].  

In [41]:
titles[0]# Note the ID-0 corresponds to "Toy Story"

'Toy Story (1995)'

How to find a rating.  To see what rating with a given user gave a given movie just enter:  ratings(movie_id,user_id)

In [43]:
ratings[22,0] # Note that user #22 gave 'Toy Story' 4 stars.

4

Suppose we want to know what movie got the largest overall score.  That is, we total the ratings for each movie and determine which one got the largest result.  To do this we simply sum all the rows and find the largest outcome.

In [46]:
scores = ratings.sum(axis=1) # scores is an array of scores for each row
scores.shape # the size of array is the number of movies  1682

(1682,)

In [47]:
scores[:10] # this displays the first 10 scores, notice movie ID-1 got a score of 420.

array([1753,  420,  273,  742,  284,   93, 1489,  875, 1165,  341],
      dtype=uint64)

In [66]:
scores.max() # this tells us the largest value in the scores array is 2541.  But what movie does this correspond to? We use argmax function.

2541

The max function returs the maximum value in an array.  The argmax function will return the index or postion in the array where the maxmum occurs. Notice below 
the maximum in scores occurs at index value 49.

In [49]:
np.argmax(scores) # movie ID-49 go the largest score

49

In [50]:
scores[49] # Yep ID-49 got 2541 stars

2541

In [51]:
#and the winner is:
titles[49]

'Star Wars (1977)'

This is just the beginning of our exploring the ratings matrix.  Much more later but enough for now.

## Your turn...
Answer the following 5 questions.  Go to gradescope python exericse 1. and enter your answers

1. What rating did user 22 give to the movie 'Toy Story"?
2. What is the title of the movie with ID-22?
3. What is total score for movie ID-22?
4. What user awarded the most points or stars?
5. What movie comes in second place in terms of largest overall score.  Hint: look at np.argsort.  To get help on a command type np.argsort? in command cell and execute.