In [2]:
import pandas as pd
import difflib  # For finding close matches

# Load the dataset
file_path = '/content/combined_movies_ratings.csv'  # Update the path as needed
df = pd.read_csv(file_path)

# Preprocess the dataset
df.fillna(0, inplace=True)
df['title'] = df['title'].str.strip().str.lower()  # Normalize case for title matching

# Ensure 'rating' is numeric
df['rating'] = pd.to_numeric(df['rating'], errors='coerce')

# Function to recommend movies based on both rating and view count
def recommend_movies_combined_score(movie_name, df, top_n=10, alpha=0.5, beta=0.5):
    movie_name = movie_name.lower().strip()  # Normalize input for case-insensitive matching

    # Check if the movie exists
    if movie_name not in df['title'].unique():
        # Suggest closest matches
        close_matches = difflib.get_close_matches(movie_name, df['title'].unique(), n=1, cutoff=0.6)
        if close_matches:
            return close_matches[0], False
        else:
            return None, False

    # Find users who watched the given movie
    users_watched = df[df['title'] == movie_name]['userId'].unique()

    # Find all movies watched by these users
    movies_watched_by_users = df[df['userId'].isin(users_watched)]

    # Calculate average ratings and watch counts
    movie_stats = movies_watched_by_users.groupby(['title', 'genres']).agg(
        avg_rating=('rating', 'mean'),  # Average rating
        watch_count=('userId', 'count')  # Number of users who watched
    ).reset_index()

    # Exclude the input movie from recommendations
    movie_stats = movie_stats[movie_stats['title'] != movie_name]

    # Calculate the combined score (alpha * avg_rating + beta * watch_count)
    movie_stats['combined_score'] = alpha * movie_stats['avg_rating'] + beta * movie_stats['watch_count']

    # Sort by combined score (descending) and take the top N movies
    top_recommendations = movie_stats.sort_values(by='combined_score', ascending=False).head(top_n)

    return top_recommendations, True

# Input movie name
input_movie = input("Enter a movie name to get recommendations: ").strip()

# Get recommendations
recommendations, is_found = recommend_movies_combined_score(input_movie, df)

if not is_found:
    if recommendations is None:
        print(f"Movie '{input_movie}' not found in the dataset, and no similar titles were found.")
    else:
        print(f"Movie '{input_movie}' not found. Did you mean: {recommendations.title()}? (yes/no)")
        user_response = input().strip().lower()
        if user_response == 'yes':
            # Call the recommendation function again with the suggested movie
            corrected_movie = recommendations
            new_recommendations, _ = recommend_movies_combined_score(corrected_movie, df)
            print(f"\nTop recommendations based on '{corrected_movie.title()}':")
            for index, row in new_recommendations.iterrows():
                print(f"{row['title'].title()} ({row['genres']}) - Avg Rating: {row['avg_rating']:.1f}, Watched by {row['watch_count']} users, Score: {row['combined_score']:.2f}")
        else:
            print("No movie match found. Exiting...")
else:
    print(f"\nTop recommendations based on '{input_movie.title()}':")
    for index, row in recommendations.iterrows():
        print(f"{row['title'].title()} ({row['genres']}) - Avg Rating: {row['avg_rating']:.1f}, Watched by {row['watch_count']} users, Score: {row['combined_score']:.2f}")


Enter a movie name to get recommendations: CASINO
Movie 'CASINO' not found. Did you mean: Casino (1995)? (yes/no)
YES

Top recommendations based on 'Casino (1995)':
Pulp Fiction (1994) (Comedy|Crime|Drama|Thriller) - Avg Rating: 4.3, Watched by 69 users, Score: 36.64
Silence Of The Lambs, The (1991) (Crime|Horror|Thriller) - Avg Rating: 4.2, Watched by 65 users, Score: 34.62
Usual Suspects, The (1995) (Crime|Mystery|Thriller) - Avg Rating: 4.3, Watched by 61 users, Score: 32.67
Shawshank Redemption, The (1994) (Crime|Drama) - Avg Rating: 4.6, Watched by 59 users, Score: 31.79
Forrest Gump (1994) (Comedy|Drama|Romance|War) - Avg Rating: 4.0, Watched by 58 users, Score: 30.99
Fugitive, The (1993) (Thriller) - Avg Rating: 3.8, Watched by 58 users, Score: 30.88
Fargo (1996) (Comedy|Crime|Drama|Thriller) - Avg Rating: 4.2, Watched by 56 users, Score: 30.12
Seven (A.K.A. Se7En) (1995) (Mystery|Thriller) - Avg Rating: 4.2, Watched by 56 users, Score: 30.10
Terminator 2: Judgment Day (1991) (A


**Title: Movie Recommendation System Using Collaborative Filtering**

**Abstract:**
In this project, we developed a movie recommendation system based on collaborative filtering, a popular technique used for providing personalized suggestions based on user preferences. By utilizing both **view count** and **average ratings** as key factors, we generated a list of movie recommendations to help users discover movies they are likely to enjoy. This approach leverages the power of data mining and machine learning techniques to provide an efficient, data-driven method for content recommendation. The implementation of this recommendation system is carried out in Python, using a dataset containing movie ratings and metadata.

---

 **1. Introduction**

In today’s digital world, the sheer volume of content available to users makes it difficult to discover new and relevant movies. Personalized recommendations help users find content that matches their interests and preferences. Collaborative filtering is a well-known technique in the field of recommendation systems, where the system suggests items based on the preferences of similar users.

This paper outlines the implementation of a **user-based collaborative filtering** movie recommendation system that uses **view count** (the number of users who watched a movie) and **average ratings** (user ratings) to generate a list of top recommended movies.

---
 **2. Problem Statement**

With the increasing number of movies available on streaming platforms, users often struggle to find relevant content that fits their tastes. Traditional search methods may fail to provide personalized suggestions based on individual preferences. Thus, there is a need for a system that can effectively recommend movies based on collaborative filtering using both view count and ratings.

---

**3. Project Overview**

The project consists of the following key steps:
1. **Data Preprocessing**: Cleaning and transforming the movie dataset.
2. **Collaborative Filtering**: Applying collaborative filtering using both ratings and view counts.
3. **Recommendation Generation**: Recommending the top movies based on the weighted score of ratings and view counts.
4. **Evaluation and Output**: Displaying the recommended movies to the user.

---
**4. Dataset**

The dataset used in this project contains information on movie ratings, movie titles, genres, and user interactions. The dataset has the following columns:

- **userId**: Identifier for the user who rated a movie.
- **movieId**: Identifier for the movie that was rated.
- **rating**: Rating given by the user (on a scale from 1 to 5).
- **title**: Name of the movie.
- **genres**: Genres associated with the movie.

---

**5. Methodology**

5.1 Data Preprocessing

The first step in the project is to preprocess the dataset to ensure consistency and handle missing values. The `title` column is normalized to lowercase for easier comparison, and missing values in the dataset are filled with `0` to avoid errors during analysis.

```python
df.fillna(0, inplace=True)
df['title'] = df['title'].str.strip().str.lower()
df['rating'] = pd.to_numeric(df['rating'], errors='coerce')
```

5.2 Collaborative Filtering Approach

In collaborative filtering, the idea is to recommend movies based on the preferences of users who have shown interest in similar movies. We use **user-based collaborative filtering**, which involves finding users who have watched the same movie as the input movie and recommending movies that these users have also watched.

**Key Concept**:
- **View Count**: The number of users who watched a particular movie.
- **Average Rating**: The average rating given by all users who watched the movie.

We then combine these two metrics into a **weighted score**, where we give higher priority to movies with a high rating and more viewers.

5.3 Calculation of Weighted Score

To calculate the weighted score for each movie, we use the following formula:

***Weighted Score=(Average Rating×Rating Weight) + (Watch Count×View Count Weight)***


In the implementation, we use the following weights:
- **Rating Weight**: 0.7 (importance given to the rating).
- **View Count Weight**: 0.3 (importance given to the number of users who watched the movie).

By combining these two factors, the system generates a **top N recommendations** based on the highest weighted scores.

5.4 Recommendation Generation

The recommendations are generated by filtering out the input movie and then selecting the movies that were watched by users who also watched the input movie. The movies are then ranked based on their weighted score.

```python
movie_stats['weighted_score'] = movie_stats['avg_rating'] * 0.7 + movie_stats['watch_count'] * 0.3
top_recommendations = movie_stats.sort_values(by='weighted_score', ascending=False).head(top_n)
```

This generates the top `N` movie recommendations based on both ratings and view counts.

---


**6. Evaluation and Results**

The system recommends the top 10 movies based on a combination of **view count** and **average rating**. For example, if the user inputs the movie **“Casino”**, the system identifies other movies watched by users who also watched **“Casino”**, and ranks these movies by their weighted score.

The output might look like this:

```
Top recommendations based on 'Casino':
1. Goodfellas (Crime, Drama) - Avg Rating: 4.5, Watched by 120 users
2. The Godfather (Crime, Drama) - Avg Rating: 4.4, Watched by 150 users
3. Scarface (Crime, Drama) - Avg Rating: 4.3, Watched by 100 users
```

---

**7. Conclusion**

In this project, we implemented a movie recommendation system using **user-based collaborative filtering**. By utilizing both **view count** and **average ratings**, we were able to generate recommendations that prioritize highly-rated movies that are popular among users with similar tastes. The system effectively recommends movies based on both user preferences and popularity, improving the overall user experience.

Future work can include enhancing the system with more sophisticated algorithms such as **matrix factorization** or **content-based filtering** to provide even more personalized recommendations.

