<a href="https://colab.research.google.com/github/msr0b0tjennica/movies-recommendation-system/blob/main/MovieRecommendationSystem.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In a world overflowing with cinematic choices, finding the perfect movie can feel overwhelming. This Movie Recommendation System leverages collaborative filtering and matrix factorization to predict user preferences based on past ratings. By analyzing patterns in user interactions, our system intelligently suggests movies that align with individual tastes—transforming the way we discover films. Whether you're a casual viewer or a cinephile, this project aims to enhance your movie-watching experience through the power of machine learning. Let’s dive into the world of AI-powered recommendations!

In [1]:
!pip install pandas numpy scikit-learn surprise nltk


Collecting surprise
  Downloading surprise-0.1-py2.py3-none-any.whl.metadata (327 bytes)
Collecting scikit-surprise (from surprise)
  Downloading scikit_surprise-1.1.4.tar.gz (154 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.4/154.4 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Downloading surprise-0.1-py2.py3-none-any.whl (1.8 kB)
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (pyproject.toml) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.4-cp311-cp311-linux_x86_64.whl size=2505184 sha256=6d6ee59f52ee56136a8a511d26d6cc47b1c98aa68fe1dc02d842803db7c9093a
  Stored in directory: /root/.cache/pip/wheels/2a/8f/6e/7e2899163e2d85d8266daab4aa1cdabec7a6c56f83c015b5af
Successfully built scikit-surprise
Install

In [2]:
!wget -O ml-latest-small.zip https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
!unzip ml-latest-small.zip


--2025-03-02 19:37:59--  https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 978202 (955K) [application/zip]
Saving to: ‘ml-latest-small.zip’


2025-03-02 19:38:02 (952 KB/s) - ‘ml-latest-small.zip’ saved [978202/978202]

Archive:  ml-latest-small.zip
   creating: ml-latest-small/
  inflating: ml-latest-small/links.csv  
  inflating: ml-latest-small/tags.csv  
  inflating: ml-latest-small/ratings.csv  
  inflating: ml-latest-small/README.txt  
  inflating: ml-latest-small/movies.csv  


####Load the dataset

In [3]:
import pandas as pd

# Load the movies and ratings datasets
movies = pd.read_csv("ml-latest-small/movies.csv")
ratings = pd.read_csv("ml-latest-small/ratings.csv")

# Display first few rows
print(movies.head())
print(movies.columns)
print(ratings.head())


   movieId                               title  \
0        1                    Toy Story (1995)   
1        2                      Jumanji (1995)   
2        3             Grumpier Old Men (1995)   
3        4            Waiting to Exhale (1995)   
4        5  Father of the Bride Part II (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                   Adventure|Children|Fantasy  
2                               Comedy|Romance  
3                         Comedy|Drama|Romance  
4                                       Comedy  
Index(['movieId', 'title', 'genres'], dtype='object')
   userId  movieId  rating  timestamp
0       1        1     4.0  964982703
1       1        3     4.0  964981247
2       1        6     4.0  964982224
3       1       47     5.0  964983815
4       1       50     5.0  964982931


In [4]:
# Merge movies and ratings
df = pd.merge(ratings, movies, on='movieId')

# Drop timestamp (not needed)
df = df.drop(columns=['timestamp'])

print(df.head())


   userId  movieId  rating                        title  \
0       1        1     4.0             Toy Story (1995)   
1       1        3     4.0      Grumpier Old Men (1995)   
2       1        6     4.0                  Heat (1995)   
3       1       47     5.0  Seven (a.k.a. Se7en) (1995)   
4       1       50     5.0   Usual Suspects, The (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                               Comedy|Romance  
2                        Action|Crime|Thriller  
3                             Mystery|Thriller  
4                       Crime|Mystery|Thriller  


In [5]:
print(df[df['userId']>1].head())


     userId  movieId  rating                             title  \
232       2      318     3.0  Shawshank Redemption, The (1994)   
233       2      333     4.0                  Tommy Boy (1995)   
234       2     1704     4.5          Good Will Hunting (1997)   
235       2     3578     4.0                  Gladiator (2000)   
236       2     6874     4.0          Kill Bill: Vol. 1 (2003)   

                     genres  
232             Crime|Drama  
233                  Comedy  
234           Drama|Romance  
235  Action|Adventure|Drama  
236   Action|Crime|Thriller  


In [6]:
from surprise import Dataset, Reader
from surprise.model_selection import train_test_split

# Define rating scale
reader = Reader(rating_scale=(0.5, 5.0))

# Load dataset into Surprise
data = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], reader)

# Split into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2)


In [7]:
from surprise import SVD
from surprise import accuracy
from surprise.model_selection import cross_validate

# Train SVD model
model = SVD()
model.fit(trainset)

# Make predictions
predictions = model.test(testset)

# Evaluate performance
rmse = accuracy.rmse(predictions)
print("Root Mean Squared Error (RMSE):", rmse)


RMSE: 0.8704
Root Mean Squared Error (RMSE): 0.8703714378438


In [8]:
# Get all movie IDs
movie_ids = df['movieId'].unique()

# Find movies not rated by the user
user_rated_movies = df[df['userId'] == 1]['movieId']
unseen_movies = list(set(movie_ids) - set(user_rated_movies))

# Predict ratings for unseen movies
predictions = [(movie, model.predict(1, movie).est) for movie in unseen_movies]

# Sort movies by predicted rating
top_movies = sorted(predictions, key=lambda x: x[1], reverse=True)[:20]

# Display top recommendations
recommended_movies = movies[movies['movieId'].isin([movie[0] for movie in top_movies])]
print(recommended_movies)


      movieId                                              title  \
277       318                   Shawshank Redemption, The (1994)   
413       475                   In the Name of the Father (1993)   
613       778                               Trainspotting (1996)   
694       912                                  Casablanca (1942)   
709       928                                     Rebecca (1940)   
878      1172     Cinema Paradiso (Nuovo cinema Paradiso) (1989)   
906      1204                          Lawrence of Arabia (1962)   
924      1223    Grand Day Out with Wallace and Gromit, A (1989)   
965      1266                                  Unforgiven (1992)   
975      1276                              Cool Hand Luke (1967)   
2260     3000           Princess Mononoke (Mononoke-hime) (1997)   
2743     3681  For a Few Dollars More (Per qualche dollaro in...   
3141     4226                                     Memento (2000)   
3562     4878                                Don

In [9]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Convert genres to lowercase strings
movies['genres'] = movies['genres'].fillna('').astype(str)

# TF-IDF vectorization
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = vectorizer.fit_transform(movies['genres'])

# Compute cosine similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Function to get recommendations based on movie title
def recommend_movie(title, num_recommendations=5):
    index = movies[movies['title'] == title].index[0]
    scores = list(enumerate(cosine_sim[index]))
    sorted_scores = sorted(scores, key=lambda x: x[1], reverse=True)[1:num_recommendations+1]

    recommended_titles = [movies.iloc[i[0]].title for i in sorted_scores]
    return recommended_titles

# Example: Recommend movies similar to "Toy Story (1995)"
print(recommend_movie("Toy Story (1995)"))


['Antz (1998)', 'Toy Story 2 (1999)', 'Adventures of Rocky and Bullwinkle, The (2000)', "Emperor's New Groove, The (2000)", 'Monsters, Inc. (2001)']


In [10]:
!apt-get install git -y


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git is already the newest version (1:2.34.1-1ubuntu1.12).
0 upgraded, 0 newly installed, 0 to remove and 29 not upgraded.


In [11]:
!git config --global user.name "msr0b0tjennica"
!git config --global user.email "jennicabhaskaran@gmail.com"


In [14]:
from getpass import getpass
import os

# Prompt for GitHub Personal Access Token
token = getpass('Enter your GitHub Personal Access Token: ')

# Clone the repository
repo_url = f'https://github.com/msr0b0tjennica/movies-recommendation-system'
!git clone {repo_url}


Enter your GitHub Personal Access Token: ··········
Cloning into 'movies-recommendation-system'...
remote: Enumerating objects: 3, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (3/3), done.


In [19]:
import os
print(os.listdir())  # Lists all files in the current directory


['.config', 'ml-latest-small.zip', 'ml-latest-small', 'movies-recommendation-system', 'sample_data']


In [18]:
import shutil

# Replace with your notebook's filename
notebook_filename = 'MovieRecommendationSystem.ipynb'
repo_name = 'movies-recommendation-system'

shutil.move(notebook_filename, repo_name)


FileNotFoundError: [Errno 2] No such file or directory: 'MovieRecommendationSystem.ipynb'