# **Title of Project**

Movie Recommendation System

-------------

## **Objective**

The goal of this project is to build a movie recommendation system using Python. The system will recommend and predict ratings for movies based on users’ past interactions and preferences of similar users using collaborative filtering.



## **Data Source**


We will use the MovieLens dataset, which contains movie ratings by different users. You can download this dataset from MovieLens.

## **Import Library**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import numpy as np


## **Import Data**

In [None]:

movies = pd.read_csv('movies.csv')  # Contains movieId, title, genres
ratings = pd.read_csv('ratings.csv')  # Contains userId, movieId, rating, timestamp

## **Describe Data**

In [None]:
# Display first few rows of both datasets
print(movies.head())
print(ratings.head())

# Data description for ratings
print(ratings.describe())

# Data types and missing values check
print(movies.info())
print(ratings.info())


## **Data Visualization**

In [None]:
# Distribution of movie ratings
plt.figure(figsize=(8,6))
sns.histplot(ratings['rating'], bins=10, kde=False)
plt.title('Distribution of Movie Ratings')
plt.xlabel('Rating')
plt.ylabel('Frequency')
plt.show()

# Count of ratings per movie
movie_ratings_count = ratings.groupby('movieId')['rating'].count()
plt.figure(figsize=(10,6))
sns.histplot(movie_ratings_count, bins=50, kde=False)
plt.title('Number of Ratings per Movie')
plt.xlabel('Number of Ratings')
plt.ylabel('Number of Movies')
plt.show()


## **Data Preprocessing**

In [None]:
# Merge movie and rating data
movie_ratings = pd.merge(ratings, movies, on='movieId')

# Create a pivot table (user-movie rating matrix)
user_movie_matrix = movie_ratings.pivot_table(index='userId', columns='title', values='rating')

# Fill missing values (NaNs) with 0
user_movie_matrix.fillna(0, inplace=True)


## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
#In our model, the target variable (y) is the rating that users give to movies. The feature variables (X) can include user IDs and movie IDs, which represent the interaction between users and movies.

#We will perform matrix factorization, treating both user and movie features as latent variables (which we will derive using a model).


# Target variable (y): Movie ratings
y_train = train_data['rating']
y_test = test_data['rating']

# Feature variables (X): User IDs and Movie IDs
X_train = train_data[['userId', 'movieId']]
X_test = test_data[['userId', 'movieId']]

print(X_train.head(), y_train.head())

## **Train Test Split**

In [None]:
# Train-test split
train_data, test_data = train_test_split(ratings, test_size=0.2, random_state=42)

# Creating user-item matrices for train and test sets
train_matrix = train_data.pivot_table(index='userId', columns='movieId', values='rating').fillna(0)
test_matrix = test_data.pivot_table(index='userId', columns='movieId', values='rating').fillna(0)


## **Modeling**

In [None]:
# We will use a simple linear regression model as an introduction to predictive modeling for recommendation systems. This model will try to predict a user’s rating based on user and movie IDs as input features.


# Initialize a linear regression model
model = LinearRegression()

# Train the model using userId and movieId as features, and rating as the target
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

print(f"Predicted Ratings: {y_pred[:5]}")

## **Model Evaluation**

In [None]:
#After training the model, we can evaluate its performance using evaluation metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE).

# Evaluate the model using Mean Absolute Error (MAE)
mae = mean_absolute_error(y_test, y_pred)
print(f'Mean Absolute Error (MAE): {mae}')

# Evaluate the model using Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error (MSE): {mse}')

# Evaluate the model using Root Mean Squared Error (RMSE)
rmse = np.sqrt(mse)
print(f'Root Mean Squared Error (RMSE): {rmse}')


#Explanation of Metrics:
MAE: The average absolute difference between predicted and actual ratings.
MSE: The average of the squared differences between predicted and actual ratings.
RMSE: The square root of MSE, indicating how close predictions are to the actual ratings.


## **Prediction**

In [None]:
def predict_rating(user_id, movie_id):
    # Find similar users
    similar_users = user_similarity_df[user_id].sort_values(ascending=False)[1:]

    # Get ratings for the movie by similar users
    similar_user_ratings = train_matrix.loc[similar_users.index, movie_id]
    
    # Get the similarity scores of these users
    similarity_scores = similar_users.loc[similar_user_ratings.index]

    # Calculate weighted average of ratings
    weighted_sum = np.dot(similar_user_ratings, similarity_scores)
    similarity_sum = similarity_scores.sum()

    # Avoid division by zero
    if similarity_sum == 0:
        return 0

    predicted_rating = weighted_sum / similarity_sum
    return predicted_rating

# Example: Predict the rating for user 1 on movie with movieId=1
predicted_rating = predict_rating(1, 1)
print(f"Predicted Rating for User 1 on Movie 1: {predicted_rating}")


## **Explaination**

Explanation:
similar_users: We find all users similar to the target user.
similar_user_ratings: We get the ratings of the selected movie from these similar users.
weighted_sum: This calculates the weighted sum of the similar users' ratings, where the weight is their similarity to the target user.
similarity_sum: This is the sum of similarity scores to normalize the ratings.
The final predicted rating is the weighted average.
