### Codio Activity 19.5: Distance Based Recommendations

**Expected Time = 60 minutes**

**Total Points = 40**

As another example of recommendation approaches, this assignment applies a distance-based approach to recommendations using the idea of **cosine distance**. Using information about users and items, you will create an item distance matrix. Using these distances, you will make recommendations for users based on similar items to those they have rated highly.

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)
- [Problem 5](#-Problem-5)
- [Problem 6](#-Problem-6)

In [9]:
import pandas as pd
import numpy as np

In [10]:
df = pd.read_csv("data/movie_ratings_19_5.csv", index_col=0).tail(10_000)

In [11]:
df.head()

Unnamed: 0,movieId,title,userId,rating
44736,2535,"Thing, The (1982)",368,3.0
44737,2537,"Thing, The (1982)",217,2.0
44738,2537,"Thing, The (1982)",288,1.0
44739,2537,"Thing, The (1982)",448,3.0
44740,2538,"Thing, The (1982)",575,1.0


[Back to top](#-Index)

### Problem 1

#### Pivot Matrix

**10 Points**

Below, use the DataFrame and the `pivot_table` function in pandas to create a table where the rows are the movie titles, columns are user ID's and the values are the associated ratings of the movies. 


**Note**: Be sure to fill the missing values of the `piv_df` with 0 using `.fillna(0)` before finding distances.

`pivot_table` [documentation](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html)

In [12]:
### GRADED
piv_df = pd.pivot_table(df, index="title", columns="userId", values="rating").fillna(0)

### ANSWER CHECK
piv_df

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Til There Was You (1997),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
THX 1138 (1971),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.5,...,0.0,0.0,0.0,0.0,0.0,2.5,0.0,0.0,0.0,0.0
TMNT (Teenage Mutant Ninja Turtles) (2007),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
"TV Set, The (2006)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Thing, The (1982)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zoolander 2 (2016),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zootopia (2016),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.5
Zulu (1964),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
xXx (2002),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


[Back to top](#-Index)

### Problem 2

#### Cosine Distance

**10 Points**

To determine the similarity of movies, you can consider the distance between two movie arrays.  Below, use the scikitlearn implementation of `pairwise_distances` using the `cosine` as the metric.  Assign the results as `distance_array`.

In [13]:
from sklearn.metrics.pairwise import pairwise_distances

In [14]:
### GRADED
distance_array = pairwise_distances(piv_df, metric="cosine")


### ANSWER CHECK
distance_array

array([[2.22044605e-16, 7.71252145e-01, 1.00000000e+00, ...,
        1.00000000e+00, 9.21470442e-01, 1.00000000e+00],
       [7.71252145e-01, 2.22044605e-16, 1.00000000e+00, ...,
        8.16593692e-01, 9.64073064e-01, 1.00000000e+00],
       [1.00000000e+00, 1.00000000e+00, 0.00000000e+00, ...,
        1.00000000e+00, 8.22906546e-01, 8.94643410e-01],
       ...,
       [1.00000000e+00, 8.16593692e-01, 1.00000000e+00, ...,
        2.22044605e-16, 1.00000000e+00, 1.00000000e+00],
       [9.21470442e-01, 9.64073064e-01, 8.22906546e-01, ...,
        1.00000000e+00, 0.00000000e+00, 9.45197115e-01],
       [1.00000000e+00, 1.00000000e+00, 8.94643410e-01, ...,
        1.00000000e+00, 9.45197115e-01, 0.00000000e+00]])

[Back to top](#-Index)

### Problem 3

#### Create a Distance DataFrame

**10 Points**

Using your distance array, create a DataFrame with both index and column names as the movie names.  

In [15]:
### GRADED
dist_df = pd.DataFrame(distance_array, columns=piv_df.index, index=piv_df.index)

### ANSWER CHECK
dist_df.head()

title,'Til There Was You (1997),THX 1138 (1971),TMNT (Teenage Mutant Ninja Turtles) (2007),"TV Set, The (2006)","Thing, The (1982)","Thing, The (2011)",Things to Do in Denver When You're Dead (1995),Think Like a Man (2012),Thinner (1996),"Third Man, The (1949)",...,Zombie (a.k.a. Zombie 2: The Dead Are Among Us) (Zombi 2) (1979),Zombieland (2009),"Zone, The (La Zona) (2007)",Zookeeper (2011),Zoolander (2001),Zoolander 2 (2016),Zootopia (2016),Zulu (1964),xXx (2002),xXx: State of the Union (2005)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Til There Was You (1997),2.220446e-16,0.7712521,1.0,1.0,1.0,1.0,0.817301,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,0.893782,1.0,0.907731,1.0,0.92147,1.0
THX 1138 (1971),0.7712521,2.220446e-16,1.0,1.0,1.0,1.0,0.846763,1.0,1.0,1.0,...,1.0,0.945805,1.0,0.7856,0.902052,1.0,0.848035,0.816594,0.964073,1.0
TMNT (Teenage Mutant Ninja Turtles) (2007),1.0,1.0,0.0,1.0,0.846993,1.0,1.0,0.468888,1.0,0.85078,...,1.0,0.845006,1.0,1.0,0.663249,1.0,0.795231,1.0,0.822907,0.894643
"TV Set, The (2006)",1.0,1.0,1.0,2.220446e-16,0.863569,0.575141,1.0,1.0,1.0,1.0,...,0.463125,0.865746,1.0,1.0,1.0,1.0,0.815462,1.0,1.0,1.0
"Thing, The (1982)",1.0,1.0,0.846993,0.8635688,0.0,0.884072,1.0,1.0,0.885604,0.867903,...,0.745879,0.839,1.0,0.947807,0.843441,1.0,0.871958,1.0,0.827047,0.880307


[Back to top](#-Index)

### Problem 4

#### Using the Distances to make recommendations

**10 Points**

Use the `dist_df` to decide what movie you would recommend to a user who rated `'xXx (2002)'` highly -- aka what is the most similar movie?

In [16]:
### GRADED
recommendation = dist_df.loc["xXx (2002)", :].sort_values().index[1]
recommendation

'Time Bandits (1981)'