### Codio Activity 19.5: Distance Based Recommendations

**Expected Time = 60 minutes**

**Total Points = 40**

As another example of recommendation approaches, this assignment applies a distance-based approach to recommendations using the idea of **cosine distance**. Using information about users and items, you will create an item distance matrix. Using these distances, you will make recommendations for users based on similar items to those they have rated highly.

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)

In [2]:
import pandas as pd
import numpy as np

In [4]:
df = pd.read_csv('../data/movie_ratings_19_5.csv', index_col=0)

In [6]:
df.head()

Unnamed: 0,movieId,title,userId,rating
5319,185,Clueless (1995),201,3.0
5320,185,Clueless (1995),219,2.0
5321,185,Clueless (1995),227,3.0
5322,185,Clueless (1995),229,4.0
5323,185,Clueless (1995),232,3.0


[Back to top](#-Index)

### Problem 1

#### Pivot Matrix

**10 Points**

Below, use the `df` dataframe and the Pandas `pivot_table` function to create a table where the rows are the movie titles, the columns are the user ID's, and the values are the associated ratings of the movies. 


**Note**: Be sure to fill the missing values of the `piv_df` with 0 using `.fillna(0)` before finding distances.

For more information about the `pivot_table` function, see the [ official documentation](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html)

In [8]:
# Create a pivot table with movie titles as rows, user IDs as columns, and ratings as values
piv_df = df.pivot_table(index='title', columns='userId', values='rating')

# Fill missing values with 0
piv_df = piv_df.fillna(0)

### ANSWER CHECK
piv_df.head()

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Round Midnight (1986),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Til There Was You (1997),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Clueless (1995),0.0,0.0,0.0,0.0,0.0,3.0,0.0,4.0,0.0,0.0,...,0.0,3.0,0.0,4.0,0.0,0.0,0.0,3.0,3.0,0.0
Coach Carter (2005),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,4.5,0.0,0.0,0.0,0.0
Coal Miner's Daughter (1980),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


[Back to top](#-Index)

### Problem 2

#### Cosine Distance

**10 Points**

To determine the similarity of movies, you can consider the cosine distance between two movie arrays.  

Below, use the Scikit-Learn implementation of `pairwise_distances` using `cosine` as the metric.  Assign the results as `distance_array`.

In [12]:
from sklearn.metrics.pairwise import pairwise_distances

In [13]:
### GRADED
distance_array = pairwise_distances(piv_df, metric='cosine')

### ANSWER CHECK
distance_array.shape

(5075, 5075)

[Back to top](#-Index)

### Problem 3

#### Create a Distance DataFrame

**10 Points**

Using `distance_array`, create a dataframe that has as index and column names `piv_df.index`.

In [16]:
### GRADED
# Create a DataFrame from distance_array, using piv_df.index as both the index and column names
dist_df = pd.DataFrame(distance_array, index=piv_df.index, columns=piv_df.index)

### ANSWER CHECK
dist_df.head()

title,'Round Midnight (1986),'Til There Was You (1997),Clueless (1995),Coach Carter (2005),Coal Miner's Daughter (1980),Cobb (1994),Cobra (1986),Cocaine Cowboys (2006),Cocktail (1988),Coco (2017),...,Zombieland (2009),"Zone, The (La Zona) (2007)",Zookeeper (2011),Zoolander (2001),Zoolander 2 (2016),Zootopia (2016),Zulu (1964),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Round Midnight (1986),0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,0.967259,1.0,1.0,1.0,1.0,1.0,1.0
'Til There Was You (1997),1.0,2.220446e-16,0.88436,1.0,1.0,1.0,1.0,1.0,0.862171,1.0,...,1.0,1.0,1.0,0.893782,1.0,0.907731,1.0,1.0,0.92147,1.0
Clueless (1995),1.0,0.8843602,0.0,0.9449909,0.937616,0.888982,0.970826,1.0,0.911382,0.945281,...,0.913059,1.0,0.974194,0.835638,1.0,0.947219,1.0,0.906243,0.803393,1.0
Coach Carter (2005),1.0,1.0,0.944991,1.110223e-16,1.0,1.0,1.0,0.816326,0.885037,0.880089,...,1.0,1.0,1.0,0.748878,0.781806,0.923039,1.0,0.781258,0.938593,1.0
Coal Miner's Daughter (1980),1.0,1.0,0.937616,1.0,0.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,0.947376,1.0,1.0,1.0,1.0,0.863828,1.0


[Back to top](#-Index)

### Problem 4

#### Using the Distances to make recommendations

**10 Points**

Use the `dist_df` dataframe to decide what movie you would recommend to a user who rated `'xXx (2002)'` highly -- aka what is the most similar movie?

HINT: You can use `.nsmallest().index[1]` to find the most similar movie.

In [18]:
### GRADED
recommendation = dist_df['xXx (2002)'].nsmallest(2).index[1]

### ANSWER CHECK
print(recommendation)

Time Bandits (1981)
