## Scope

### Store Movies
- Create a class to store movies
- It should be able to store movies by ID and be able to find its title based on the ID

### Create Users
- From the data we are going to create a bunch of user objects.
- These users will have an ID and they will store the movies they've rated.
- User objects can also be compared, and a similarity score is computed.
  - that metric can be decided on later
    - euclidian distance
    - cosine similarty

### Recommend movies to new users
- Compute the 10 closest neighbors (most similar users) to the new user
- create some list (store) of all movies its nearest neighbors have seen that new user has not seen
- for each of those movies calculate the predicted score the new user would rate it
  - create a score (float) variable = 0
  - create a similarities_sum (float) variable = 0
  - For each neighbor that has seen that movie, add the neighbor's simillarity * their score for the movie
  - divide the score by similarities_sum and return it
- return the top 10 movies you'd recommend the the new user

In [1]:
import pandas as pd
import numpy as np
from users import Users
from movies import Movies
from recommender import Recommender

In [2]:
# import movies csv (genre does not matter for v1)
movies_df = pd.read_csv('movielens/movies.csv').drop('genres', axis=1)
print('Shape:', movies_df.shape)
movies_df.head()

Shape: (9125, 2)


Unnamed: 0,movieId,title
0,1,Toy Story (1995)
1,2,Jumanji (1995)
2,3,Grumpier Old Men (1995)
3,4,Waiting to Exhale (1995)
4,5,Father of the Bride Part II (1995)


In [3]:
# import ratings csv (timestamp is not impotant)
ratings_df = pd.read_csv('movielens/ratings.csv').drop('timestamp', axis=1)
print(ratings_df.shape)
ratings_df.head()

(100004, 3)


Unnamed: 0,userId,movieId,rating
0,1,31,2.5
1,1,1029,3.0
2,1,1061,3.0
3,1,1129,2.0
4,1,1172,4.0


### How many unique movies and users are in our dataset?

In [4]:
num_users = len(ratings_df['userId'].unique())
num_movies = len(movies_df)
print(f"Number of users: {num_users}")
print(f"Number of movies: {num_movies}")

Number of users: 671
Number of movies: 9125


### Add all movies to a Movies class

In [5]:
movies = Movies()
for movie_id, movie_title in zip(movies_df['movieId'], movies_df['title']):
    movies.add_movie(movie_id, movie_title)

### Add all users to a Users class

In [6]:
users = Users()
for user_id, movie_id, rating in zip(ratings_df['userId'], ratings_df['movieId'], ratings_df['rating']):
    users.add_user_rating(user_id, movie_id, rating)

### Ensure the number of users and movies is correct

In [7]:
print(f"Number of users: {len(users.users)}")
print(f"Number of movies: {len(movies.movies)}")

Number of users: 671
Number of movies: 9125


### Fit the data to recommender class

In [8]:
r = Recommender()

In [9]:
r.fit(users, movies)

### Observe a user from our dataset.
For simplicity we'll be looking at the first user

In [10]:
new_user = users.get_user_by_id(1)
for movie_id, rating in sorted(new_user.movie_ratings.items(), reverse=True, key=lambda item: item[::-1]):
    print(rating, r.movies.get_title(movie_id))

4.0 Tron (1982)
4.0 French Connection, The (1971)
4.0 Cinema Paradiso (Nuovo cinema Paradiso) (1989)
3.5 Dracula (Bram Stoker's Dracula) (1992)
3.0 Blazing Saddles (1974)
3.0 Gods Must Be Crazy, The (1980)
3.0 Sleepers (1996)
3.0 Dumbo (1941)
2.5 Fly, The (1986)
2.5 Star Trek: The Motion Picture (1979)
2.5 Dangerous Minds (1995)
2.0 Antz (1998)
2.0 Willow (1988)
2.0 Cape Fear (1991)
2.0 Gandhi (1982)
2.0 Ben-Hur (1959)
2.0 Deer Hunter, The (1978)
2.0 Escape from New York (1981)
1.0 Time Bandits (1981)
1.0 Beavis and Butt-Head Do America (1996)


### Recommend new movies to user using KNN (approach 1)

In [11]:
r.recommend(new_user)

['Chronicles of Narnia: The Lion, the Witch and the Wardrobe, The (2005)',
 'Proof (2005)',
 'Ali G Indahouse (2002)',
 'Lord of the Rings: The Return of the King, The (2003)',
 'Lord of the Rings: The Fellowship of the Ring, The (2001)',
 'Shane (1953)',
 'Titus (1999)',
 'Mister Roberts (1955)',
 'King Kong (1933)',
 'Strangers on a Train (1951)']

### Recommend new movies to user using cosine similarity matrix (approach 2)
Very slow

In [14]:
r.recommend_cosine_similarity_matrix(new_user)

10%
20%
30%
40%
50%
60%
70%
80%
90%


['Star Wars: Episode IV - A New Hope (1977)',
 'Pulp Fiction (1994)',
 'Star Wars: Episode V - The Empire Strikes Back (1980)',
 'Shawshank Redemption, The (1994)',
 'Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)',
 'Silence of the Lambs, The (1991)',
 'Godfather, The (1972)',
 'Back to the Future (1985)',
 'Fargo (1996)',
 'Forrest Gump (1994)']