### Recommendations with MovieTweetings: Getting to Know The Data

Throughout this lesson, you will be working with the [MovieTweetings Data](https://github.com/sidooms/MovieTweetings/tree/master/recsyschallenge2014).

**Note:** There are solutions to each of the notebooks available by hitting the orange jupyter logo in the top left of this notebook.  Additionally, you can watch me work through the solutions on the screencasts that follow each workbook. 

To get started, read in the libraries and the two datasets you will be using throughout the lesson using the code below.

 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tests as t

%matplotlib inline

# Read in the MovieTweetings dataset originally taken from 
# https://github.com/sidooms/MovieTweetings/tree/master/latest
movies = pd.read_csv('movies.dat', delimiter='::', header=None, names=['movie_id', 'movie', 'genre'], dtype={'movie_id': object}, engine='python')
reviews = pd.read_csv('ratings.dat', delimiter='::', header=None, names=['user_id', 'movie_id', 'rating', 'timestamp'], dtype={'movie_id': object, 'user_id': object, 'timestamp': object}, engine='python')

#### 1. Take a Look At The Data 

Take a look at the data and use your findings to fill in the dictionary below with the correct responses to show your understanding of the data.

In [2]:
movies.head()

Unnamed: 0,movie_id,movie,genre
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short
1,10,La sortie des usines Lumière (1895),Documentary|Short
2,12,The Arrival of a Train (1896),Documentary|Short
3,25,The Oxford and Cambridge University Boat Race ...,
4,91,Le manoir du diable (1896),Short|Horror


In [3]:
movies.shape

(35479, 3)

In [4]:
reviews.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,114508,8,1381006850
1,2,208092,5,1586466072
2,2,358273,9,1579057827
3,2,10039344,5,1578603053
4,2,6751668,9,1578955697


In [5]:
reviews.isna().sum()

user_id      0
movie_id     0
rating       0
timestamp    0
dtype: int64

In [6]:
reviews.shape

(863866, 4)

In [7]:
len(movies['movie_id'].unique())

35479

In [8]:
len(reviews['movie_id'].unique())

35479

In [9]:
len(movies['genre'].unique())

2737

In [10]:
len(reviews['user_id'].unique())

67353

In [11]:
reviews['rating'].mean()

7.315877693994207

In [12]:
reviews['rating'].describe()

count    863866.000000
mean          7.315878
std           1.853831
min           0.000000
25%           6.000000
50%           8.000000
75%           9.000000
max          10.000000
Name: rating, dtype: float64

#### 2. Data Cleaning

Next, we need to pull some additional relevant information out of the existing columns. 

For each of the datasets, there are a couple of cleaning steps we need to take care of:

#### Movies
* Pull the date from the title and create new column
* Dummy the date column with 1's and 0's for each century of a movie (1800's, 1900's, and 2000's)
* Dummy column the genre with 1's and 0's

#### Reviews
* Create a date out of time stamp

You can check your results against the header of my solution by running the cell below with the **show_clean_dataframes** function.

In [13]:
movies['date']=[mov.split('(')[1].split(')')[0] for mov in movies['movie']]
movies.head()

Unnamed: 0,movie_id,movie,genre,date
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895
2,12,The Arrival of a Train (1896),Documentary|Short,1896
3,25,The Oxford and Cambridge University Boat Race ...,,1895
4,91,Le manoir du diable (1896),Short|Horror,1896


In [14]:
movies['date'].isna().sum()

0

In [15]:
movies['century'] = [mov[:2]+"00s" for mov in movies['date']]
movies.head()

Unnamed: 0,movie_id,movie,genre,date,century
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1800s
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895,1800s
2,12,The Arrival of a Train (1896),Documentary|Short,1896,1800s
3,25,The Oxford and Cambridge University Boat Race ...,,1895,1800s
4,91,Le manoir du diable (1896),Short|Horror,1896,1800s


In [16]:
movies = pd.concat([movies,pd.get_dummies(movies['century'])],axis=1)
movies.head()

Unnamed: 0,movie_id,movie,genre,date,century,1800s,1900s,2000s
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1800s,1,0,0
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895,1800s,1,0,0
2,12,The Arrival of a Train (1896),Documentary|Short,1896,1800s,1,0,0
3,25,The Oxford and Cambridge University Boat Race ...,,1895,1800s,1,0,0
4,91,Le manoir du diable (1896),Short|Horror,1896,1800s,1,0,0


In [17]:
genres = []
for val in movies.genre:
    try:
        genres.extend(val.split('|'))
    except AttributeError:
        pass

# we end up needing this later
genres = set(genres)
list(genres)

['Comedy',
 'Sci-Fi',
 'Music',
 'Animation',
 'Action',
 'Horror',
 'Romance',
 'Western',
 'War',
 'Talk-Show',
 'Crime',
 'Biography',
 'Thriller',
 'Adult',
 'Mystery',
 'Musical',
 'Adventure',
 'Reality-TV',
 'Family',
 'Documentary',
 'Fantasy',
 'History',
 'Short',
 'Sport',
 'Film-Noir',
 'Drama',
 'Game-Show',
 'News']

In [18]:
genresdf = pd.DataFrame(data = np.zeros(shape=(movies.shape[0],len(list(genres)))),columns=list(genres))
movies = pd.concat([movies,genresdf],axis=1)
movies.head()

Unnamed: 0,movie_id,movie,genre,date,century,1800s,1900s,2000s,Comedy,Sci-Fi,...,Family,Documentary,Fantasy,History,Short,Sport,Film-Noir,Drama,Game-Show,News
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1800s,1,0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895,1800s,1,0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,12,The Arrival of a Train (1896),Documentary|Short,1896,1800s,1,0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,25,The Oxford and Cambridge University Boat Race ...,,1895,1800s,1,0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,91,Le manoir du diable (1896),Short|Horror,1896,1800s,1,0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [19]:
# Function to split and return values for columns
def split_genres(val):
    try:
        if val.find(gene) >-1:
            return 1
        else:
            return 0
    except AttributeError:
        return 0

# Apply function for each genre
for gene in genres:        
    movies[gene] = movies['genre'].apply(split_genres)
movies.head()

Unnamed: 0,movie_id,movie,genre,date,century,1800s,1900s,2000s,Comedy,Sci-Fi,...,Family,Documentary,Fantasy,History,Short,Sport,Film-Noir,Drama,Game-Show,News
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1800s,1,0,0,0,0,...,0,1,0,0,1,0,0,0,0,0
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895,1800s,1,0,0,0,0,...,0,1,0,0,1,0,0,0,0,0
2,12,The Arrival of a Train (1896),Documentary|Short,1896,1800s,1,0,0,0,0,...,0,1,0,0,1,0,0,0,0,0
3,25,The Oxford and Cambridge University Boat Race ...,,1895,1800s,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,91,Le manoir du diable (1896),Short|Horror,1896,1800s,1,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0


In [20]:
import datetime

change_timestamp = lambda val: datetime.datetime.fromtimestamp(int(val)).strftime('%Y-%m-%d %H:%M:%S')

reviews['date'] = reviews['timestamp'].apply(change_timestamp)

### Recommendations with MovieTweetings: Most Popular Recommendation

Now that you have created the necessary columns we will be using throughout the rest of the lesson on creating recommendations, let's get started with the first of our recommendations.

To get started, read in the libraries and the two datasets you will be using throughout the lesson using the code below.

In [21]:
dates = reviews.groupby('movie_id')['date'].max().to_frame().rename(columns={'date':'mostrecent'})
dates.head()

Unnamed: 0_level_0,mostrecent
movie_id,Unnamed: 1_level_1
106519,2018-09-21 14:55:34
8,2014-04-08 14:20:11
10,2014-10-09 14:15:53
12,2015-08-10 19:16:19
91,2019-07-12 06:48:46


In [22]:
avgrating = reviews.groupby('movie_id')['rating'].mean().to_frame().rename(columns={'rating':'avgrating'})
avgrating.head()

Unnamed: 0_level_0,avgrating
movie_id,Unnamed: 1_level_1
106519,9.0
8,5.0
10,10.0
12,10.0
91,6.0


In [23]:
numratings = reviews.groupby('movie_id')['rating'].size().to_frame().rename(columns={'rating':'numrating'})
numratings.head()

Unnamed: 0_level_0,numrating
movie_id,Unnamed: 1_level_1
106519,1
8,1
10,1
12,1
91,3


In [24]:
withratings = avgrating.merge(numratings,on='movie_id').merge(dates,on='movie_id').merge(movies,on='movie_id')
bestmovies = withratings[withratings['numrating']>=5].sort_values(by=['avgrating','numrating','mostrecent'],ascending=False)
bestmovies

Unnamed: 0,movie_id,avgrating,numrating,mostrecent,movie,genre,date,century,1800s,1900s,...,Family,Documentary,Fantasy,History,Short,Sport,Film-Noir,Drama,Game-Show,News
30726,4921860,10.000000,48,2016-08-14 13:16:50,MSG 2 the Messenger (2015),Comedy|Drama|Fantasy|Horror,2015,2000s,0,0,...,0,0,1,0,0,0,0,1,0,0
31394,5262972,10.000000,28,2016-01-07 19:44:43,Avengers: Age of Ultron Parody (2015),Short|Comedy,2015,2000s,0,0,...,0,0,0,0,1,0,0,0,0,0
33423,6662050,10.000000,22,2019-04-20 18:29:19,Five Minutes (2017),Short|Comedy,2017,2000s,0,0,...,0,0,0,0,1,0,0,0,0,0
25542,2737018,10.000000,10,2015-05-10 18:56:01,Selam (2013),Drama|Romance,2013,2000s,0,0,...,0,0,0,0,0,0,0,1,0,0
32233,5804314,10.000000,7,2019-12-25 11:27:47,Let There Be Light (2017),Drama,2017,2000s,0,0,...,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16567,10367276,1.736842,95,2020-03-23 15:20:07,The Rat (2019),Short|Drama,2019,2000s,0,0,...,0,0,0,0,1,0,0,1,0,0
26577,3108604,1.666667,6,2018-10-14 13:47:29,American Poltergeist (2015),Horror|Thriller,2015,2000s,0,0,...,0,0,0,0,0,0,0,0,0,0
26832,3187378,1.666667,6,2017-07-11 02:13:16,The Asian Connection (2016),Action|Crime|Drama|Thriller,2016,2000s,0,0,...,0,0,0,0,0,0,0,1,0,0
29891,4458206,1.000000,30,2018-01-19 23:44:12,Kod Adi K.O.Z. (2015),Crime|Mystery,2015,2000s,0,0,...,0,0,0,0,0,0,0,0,0,0


#### 1. How To Find The Most Popular Movies

For this notebook, we have a single task.  The task is that no matter the user, we need to provide a list of the recommendations based on simply the most popular items.

For this task, we will consider what is "most popular" based on the following criteria:

* A movie with the highest average rating is considered best
* With ties, movies that have more ratings are better
* A movie must have a minimum of 5 ratings to be considered among the best movies
* If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is the most recent rating

With these criteria, the goal for this notebook is to take a **user_id** and provide back the **n_top** recommendations.  Use the function below as the scaffolding that will be used for all the future recommendations as well.

In [25]:
def popular_recommendations(user_id, n_top, ranked_movies):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    # Do stuff
    top_movies = list(ranked_movies['movie'][:n_top])
    return top_movies # a list of the n_top movies as recommended

Usint the three criteria above, you should be able to put together the above function.  If you feel confident in your solution, check the results of your function against our solution. On the next page, you can see a walkthrough and you can of course get the solution by looking at the solution notebook available in this workspace.  

In [26]:
# Put your solutions for each of the cases here
ranked_movies = bestmovies
# Top 20 movies recommended for id 1

recs_20_for_1 =  popular_recommendations('1', 20, ranked_movies)# Your solution list here

# Top 5 movies recommended for id 53968
recs_5_for_53968 =  popular_recommendations('53968', 5, ranked_movies)# Your solution list here

# Top 100 movies recommended for id 70000
recs_100_for_70000 =  popular_recommendations('70000', 100, ranked_movies)# Your solution list here

# Top 35 movies recommended for id 43
recs_35_for_43 =  popular_recommendations('43', 35, ranked_movies)# Your solution list here

In [27]:
### You Should Not Need To Modify Anything In This Cell
ranked_movies = t.create_ranked_df(movies, reviews) # only run this once - it is not fast

# check 1 
assert t.popular_recommendations('1', 20, ranked_movies) == recs_20_for_1,  "The first check failed..."
# check 2
assert t.popular_recommendations('53968', 5, ranked_movies) == recs_5_for_53968,  "The second check failed..."
# check 3
assert t.popular_recommendations('70000', 100, ranked_movies) == recs_100_for_70000,  "The third check failed..."
# check 4
assert t.popular_recommendations('43', 35, ranked_movies) == recs_35_for_43,  "The fourth check failed..."

print("If you got here, looks like you are good to go!  Nice job!")

If you got here, looks like you are good to go!  Nice job!


**Notice:** This wasn't the only way we could have determined the "top rated" movies.  You can imagine that in keeping track of trending news or trending social events, you would likely want to create a time window from the current time, and then pull the articles in the most recent time frame.  There are always going to be some subjective decisions to be made.  

If you find that no one is paying any attention to your most popular recommendations, then it might be time to find a new way to recommend, which is what the next parts of the lesson should prepare us to do!


### Part II: Adding Filters

Now that you have created a function to give back the **n_top** movies, let's make it a bit more robust.  Add arguments that will act as filters for the movie **year** and **genre**.  

Use the cells below to adjust your existing function to allow for **year** and **genre** arguments as **lists** of **strings**.  Then your ending results are filtered to only movies within the lists of provided years and genres (as `or` conditions).  If no list is provided, there should be no filter applied.

You can adjust other necessary inputs as necessary to retrieve the final results you are looking for!

Try writing a few tests against the test function in our test function.  Below returns the top 20 movies for user 1 based on the specified year and genre filters.  Does yours return the same? 

```
t.popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History'])
```

In [28]:
def popular_recommendations2(user_id, n_top, ranked_movies, years=None, genres=None):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    # Do stuff
    if years is not None:
        ranked_movies = ranked_movies[ranked_movies['date'].isin(years)]
        
    if genres is not None:
        num_genre_match = ranked_movies[genres].sum(axis=1)
        ranked_movies = ranked_movies.loc[num_genre_match > 0, :]
    
    top_movies = list(ranked_movies['movie'][:n_top])
        
    return top_movies # a list of the n_top movies as recommended

In [29]:
# Top 20 movies recommended for id 1 with years=['2015', '2016', '2017', '2018'], genres=['History']
recs_20_for_1_filtered = popular_recommendations2('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History'])

# Top 5 movies recommended for id 53968 with no genre filter but years=['2015', '2016', '2017', '2018']
recs_5_for_53968_filtered = popular_recommendations2('53968', 5, ranked_movies, years=['2015', '2016', '2017', '2018'])

# Top 100 movies recommended for id 70000 with no year filter but genres=['History', 'News']
recs_100_for_70000_filtered = popular_recommendations2('70000', 100, ranked_movies, genres=['History', 'News'])

In [30]:
### You Should Not Need To Modify Anything In This Cell

# check 1 
assert t.popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History']) == recs_20_for_1_filtered,  "The first check failed..."
# check 2
assert t.popular_recs_filtered('53968', 5, ranked_movies, years=['2015', '2016', '2017', '2018']) == recs_5_for_53968_filtered,  "The second check failed..."
# check 3
assert t.popular_recs_filtered('70000', 100, ranked_movies, genres=['History', 'News']) == recs_100_for_70000_filtered,  "The third check failed..."

print("If you got here, looks like you are good to go!  Nice job!")

If you got here, looks like you are good to go!  Nice job!


## Recommendations with MovieTweetings: Collaborative Filtering

One of the most popular methods for making recommendations is **collaborative filtering**.  In collaborative filtering, you are using the collaboration of user-item recommendations to assist in making new recommendations.  

There are two main methods of performing collaborative filtering:

1. **Neighborhood-Based Collaborative Filtering**, which is based on the idea that we can either correlate items that are similar to provide recommendations or we can correlate users to one another to provide recommendations.

2. **Model Based Collaborative Filtering**, which is based on the idea that we can use machine learning and other mathematical models to understand the relationships that exist amongst items and users to predict ratings and provide ratings.


In this notebook, you will be working on performing **neighborhood-based collaborative filtering**.  There are two main methods for performing collaborative filtering:

1. **User-based collaborative filtering:** In this type of recommendation, users related to the user you would like to make recommendations for are used to create a recommendation.

2. **Item-based collaborative filtering:** In this type of recommendation, first you need to find the items that are most related to each other item (based on similar ratings).  Then you can use the ratings of an individual on those similar items to understand if a user will like the new item.

In this notebook you will be implementing **user-based collaborative filtering**.  However, it is easy to extend this approach to make recommendations using **item-based collaborative filtering**.  First, let's read in our data and necessary libraries.

**NOTE**: Because of the size of the datasets, some of your code cells here will take a while to execute, so be patient!

### Measures of Similarity

When using **neighborhood** based collaborative filtering, it is important to understand how to measure the similarity of users or items to one another.  

There are a number of ways in which we might measure the similarity between two vectors (which might be two users or two items).  In this notebook, we will look specifically at two measures used to compare vectors:

* **Pearson's correlation coefficient**

Pearson's correlation coefficient is a measure of the strength and direction of a linear relationship. The value for this coefficient is a value between -1 and 1 where -1 indicates a strong, negative linear relationship and 1 indicates a strong, positive linear relationship. 

If we have two vectors x and y, we can define the correlation between the vectors as:


$$CORR(x, y) = \frac{\text{COV}(x, y)}{\text{STDEV}(x)\text{ }\text{STDEV}(y)}$$

where 

$$\text{STDEV}(x) = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}$$

and 

$$\text{COV}(x, y) = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})$$

where n is the length of the vector, which must be the same for both x and y and $\bar{x}$ is the mean of the observations in the vector.  

We can use the correlation coefficient to indicate how alike two vectors are to one another, where the closer to 1 the coefficient, the more alike the vectors are to one another.  There are some potential downsides to using this metric as a measure of similarity.  You will see some of these throughout this workbook.


* **Euclidean distance**

Euclidean distance is a measure of the straightline distance from one vector to another.  Because this is a measure of distance, larger values are an indication that two vectors are different from one another (which is different than Pearson's correlation coefficient).

Specifically, the euclidean distance between two vectors x and y is measured as:

$$ \text{EUCL}(x, y) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2}$$

Different from the correlation coefficient, no scaling is performed in the denominator.  Therefore, you need to make sure all of your data are on the same scale when using this metric.

**Note:** Because measuring similarity is often based on looking at the distance between vectors, it is important in these cases to scale your data or to have all data be in the same scale.  If some measures are on a 5 point scale, while others are on a 100 point scale, you are likely to have non-optimal results due to the difference in variability of your features.  In this case, we will not need to scale data because they are all on a 10 point scale, but it is always something to keep in mind!

------------

### User-Item Matrix

In order to calculate the similarities, it is common to put values in a matrix.  In this matrix, users are identified by each row, and items are represented by columns.  


In the above matrix, you can see that **User 1** and **User 2** both used **Item 1**, and **User 2**, **User 3**, and **User 4** all used **Item 2**.  However, there are also a large number of missing values in the matrix for users who haven't used a particular item.  A matrix with many missing values (like the one above) is considered **sparse**.

Our first goal for this notebook is to create the above matrix with the **reviews** dataset.  However, instead of 1 values in each cell, you should have the actual rating.  

The users will indicate the rows, and the movies will exist across the columns. To create the user-item matrix, we only need the first three columns of the **reviews** dataframe, which you can see by running the cell below.

In [31]:
user_items = reviews[['user_id', 'movie_id', 'rating']]
user_items.head()

Unnamed: 0,user_id,movie_id,rating
0,1,114508,8
1,2,208092,5
2,2,358273,9
3,2,10039344,5
4,2,6751668,9


In [36]:
user_items.shape

(863866, 3)

### Creating the User-Item Matrix

In order to create the user-items matrix (like the one above), I personally started by using a [pivot table](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html). 

However, I quickly ran into a memory error (a common theme throughout this notebook).  I will help you navigate around many of the errors I had, and achieve useful collaborative filtering results! 

_____

`1.` Create a matrix where the users are the rows, the movies are the columns, and the ratings exist in each cell, or a NaN exists in cells where a user hasn't rated a particular movie. If you get a memory error (like I did), [this link here](https://stackoverflow.com/questions/39648991/pandas-dataframe-pivot-memory-error) might help you!

In [34]:
user_by_movie = user_items.groupby(['user_id', 'movie_id'], as_index=False)['rating'].max().unstack()
user_by_movie

user_id  0           1
         1          10
         2         100
         3         100
         4         100
                  ... 
rating   863861      9
         863862     10
         863863      8
         863864     10
         863865     10
Length: 2591598, dtype: object

`2.` Now that you have a matrix of users by movies, use this matrix to create a dictionary where the key is each user and the value is an array of the movies each user has rated.

In [35]:
# Create a dictionary with users and corresponding movies seen

def movies_watched(user_id):
    '''
    INPUT:
    user_id - the user_id of an individual as int
    OUTPUT:
    movies - an array of movies the user has watched
    '''
    movies = user_by_movie.loc[user_id][user_by_movie.loc[user_id].isnull() == False].index.values

    return movies


def create_user_movie_dict():
    '''
    INPUT: None
    OUTPUT: movies_seen - a dictionary where each key is a user_id and the value is an array of movie_ids
    
    Creates the movies_seen dictionary
    '''
    n_users = user_by_movie.shape[0]
    movies_seen = dict()

    for user1 in range(1, n_users+1):
        
        # assign list of movies to each user key
        movies_seen[user1] = movies_watched(user1)
    
    return movies_seen
    
movies_seen = create_user_movie_dict()

KeyError: 1