# Movie Recommendation Analysis

This project builds a simple, rule-based movie recommendation system using IMDB ratings, audience votes, and genre information.


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")

df = pd.read_csv("data/movies_data_cleaned.csv")

In [3]:
df.info()

<class 'pandas.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 17 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Series_Title     1000 non-null   str    
 1   Released_Year    999 non-null    float64
 2   Certificate      1000 non-null   str    
 3   Runtime_Minutes  1000 non-null   int64  
 4   Genre            1000 non-null   str    
 5   Subgenre         1000 non-null   str    
 6   Subgenre 1       1000 non-null   str    
 7   IMDB_Rating      1000 non-null   float64
 8   Meta_score       843 non-null    float64
 9   Director         1000 non-null   str    
 10  Star1            1000 non-null   str    
 11  Star2            1000 non-null   str    
 12  Star3            1000 non-null   str    
 13  Star4            1000 non-null   str    
 14  No_of_Votes      1000 non-null   int64  
 15  Gross            831 non-null    float64
 16  Gross_Millions   831 non-null    float64
dtypes: float64(5), int64(2), s

In [4]:
df.head()

Unnamed: 0,Series_Title,Released_Year,Certificate,Runtime_Minutes,Genre,Subgenre,Subgenre 1,IMDB_Rating,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross,Gross_Millions
0,The Shawshank Redemption,1994.0,A,142,Drama,Unknown,Unkown,9.3,80.0,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110,28341469.0,28.341469
1,The Godfather,1972.0,A,175,Crime,Drama,Unkown,9.2,100.0,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367,134966411.0,134.966411
2,The Dark Knight,2008.0,UA,152,Action,Crime,Drama,9.0,84.0,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232,534858444.0,534.858444
3,The Godfather: Part II,1974.0,A,202,Crime,Drama,Unkown,9.0,90.0,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,1129952,57300000.0,57.3
4,12 Angry Men,1957.0,U,96,Crime,Drama,Unkown,9.0,96.0,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,689845,4360000.0,4.36


In [5]:
df.shape

(1000, 17)

In [6]:
df['score'] = df['IMDB_Rating'] * np.log(df['No_of_Votes'])

In [7]:
df.sort_values('score', ascending=False)[
    ['Series_Title', 'Genre', 'IMDB_Rating', 'No_of_Votes', 'score']
].head(10)


Unnamed: 0,Series_Title,Genre,IMDB_Rating,No_of_Votes,score
0,The Shawshank Redemption,Drama,9.3,2343110,136.403004
2,The Dark Knight,Action,9.0,2303232,131.848415
1,The Godfather,Crime,9.2,1620367,131.543102
6,Pulp Fiction,Crime,8.9,1826188,128.317897
8,Inception,Action,8.8,2067042,127.966337
5,The Lord of the Rings: The Return of the King,Action,8.9,1642758,127.375795
9,Fight Club,Drama,8.8,1854740,127.012645
11,Forrest Gump,Drama,8.8,1809221,126.793981
10,The Lord of the Rings: The Fellowship of the Ring,Action,8.8,1661481,126.044335
3,The Godfather: Part II,Crime,9.0,1129952,125.439171


In [8]:
df[
    (df['Genre'] == 'Action') &
    (df['IMDB_Rating'] >= 8.0)
].sort_values('score', ascending=False)[
    ['Series_Title', 'IMDB_Rating', 'No_of_Votes', 'score']
].head(5)


Unnamed: 0,Series_Title,IMDB_Rating,No_of_Votes,score
2,The Dark Knight,9.0,2303232,131.848415
8,Inception,8.8,2067042,127.966337
5,The Lord of the Rings: The Return of the King,8.9,1642758,127.375795
10,The Lord of the Rings: The Fellowship of the Ring,8.8,1661481,126.044335
14,The Matrix,8.7,1676426,124.68992


In [9]:
df['No_of_Votes'].describe()

count    1.000000e+03
mean     2.736929e+05
std      3.273727e+05
min      2.508800e+04
25%      5.552625e+04
50%      1.385485e+05
75%      3.741612e+05
max      2.343110e+06
Name: No_of_Votes, dtype: float64

In [10]:
def recommend_movies(genre, min_rating= 8.0, min_votes=100_000, top_n=5):
    recs = df[(df['Genre'] == genre) & (df['IMDB_Rating'] >= min_rating) & (df['No_of_Votes'] >= min_votes)].sort_values('score', ascending=False)
    
    return recs[['Series_Title', 'IMDB_Rating', 'No_of_Votes', 'score']].head(top_n)

In [11]:
recommend_movies('Action')

Unnamed: 0,Series_Title,IMDB_Rating,No_of_Votes,score
2,The Dark Knight,9.0,2303232,131.848415
8,Inception,8.8,2067042,127.966337
5,The Lord of the Rings: The Return of the King,8.9,1642758,127.375795
10,The Lord of the Rings: The Fellowship of the Ring,8.8,1661481,126.044335
14,The Matrix,8.7,1676426,124.68992


In [12]:
recommend_movies('Drama', min_rating=8.3, min_votes=100_000)

Unnamed: 0,Series_Title,IMDB_Rating,No_of_Votes,score
0,The Shawshank Redemption,9.3,2343110,136.403004
9,Fight Club,8.8,1854740,127.012645
11,Forrest Gump,8.8,1809221,126.793981
24,Saving Private Ryan,8.6,1235804,120.634198
17,One Flew Over the Cuckoo's Nest,8.7,918088,119.451422


In [13]:
genre_cols = ['Genre', 'Subgenre', 'Subgenre 1']

In [14]:
genre_cols

['Genre', 'Subgenre', 'Subgenre 1']

In [15]:
df[genre_cols]

Unnamed: 0,Genre,Subgenre,Subgenre 1
0,Drama,Unknown,Unkown
1,Crime,Drama,Unkown
2,Action,Crime,Drama
3,Crime,Drama,Unkown
4,Crime,Drama,Unkown
...,...,...,...
995,Comedy,Drama,Romance
996,Drama,Western,Unkown
997,Drama,Romance,War
998,Drama,War,Unkown


In [16]:
df['genres'] = (df[genre_cols].replace(['Unknown', 'Unkown'], np.nan).apply(lambda row: [g for g in row.dropna().tolist()], axis=1))

In [17]:
df[['Series_Title', 'genres']].head(10)

Unnamed: 0,Series_Title,genres
0,The Shawshank Redemption,[Drama]
1,The Godfather,"[Crime, Drama]"
2,The Dark Knight,"[Action, Crime, Drama]"
3,The Godfather: Part II,"[Crime, Drama]"
4,12 Angry Men,"[Crime, Drama]"
5,The Lord of the Rings: The Return of the King,"[Action, Adventure, Drama]"
6,Pulp Fiction,"[Crime, Drama]"
7,Schindler's List,"[Biography, Drama, History]"
8,Inception,"[Action, Adventure, Sci-Fi]"
9,Fight Club,[Drama]


In [18]:
df['genres_set'] = df['genres'].apply(set)

In [19]:
def recommend_by_genres(preferred_genres, min_rating=8.0, min_votes=100_000, min_overlap=1, top_n=5):
    preferred_set = set(preferred_genres)

    recs = df[
        (df['IMDB_Rating'] >= min_rating) &
        (df['No_of_Votes'] >= min_votes)
    ].copy()

    recs['genre_overlap'] = recs['genres_set'].apply(lambda s: len(s & preferred_set))
    recs = recs[recs['genre_overlap'] > min_overlap]  
    
    recs = recs.sort_values(['genre_overlap', 'score'], ascending=[False, False])

    return recs[['Series_Title', 'Released_Year', 'IMDB_Rating', 'No_of_Votes', 'genres', 'genre_overlap', 'score']].head(top_n)


In [20]:
recommend_by_genres(['Action', 'Mystery'])

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,No_of_Votes,genres,genre_overlap,score
68,Oldeuboi,2003.0,8.4,515451,"[Action, Drama, Mystery]",2,110.483499
362,The Bourne Ultimatum,2007.0,8.0,604694,"[Action, Mystery, Thriller]",2,106.499823
340,Blade Runner 2049,2017.0,8.0,461823,"[Action, Drama, Mystery]",2,104.343496


In [21]:
def recommend_similar(movie_title, min_votes=100_000, top_n=5):
    
    base = df[df['Series_Title'].str.lower() == movie_title.lower()]
    if base.empty:
        return f"Movie nit found: {movie_title}"
    
    base = base.iloc[0]
    base_genres = base['genres_set']
    
    recs = df[(df['No_of_Votes'] >= min_votes) & (df['Series_Title'] != base['Series_Title'])].copy()
   
    recs['genre_overlap'] = df['genres_set'].apply(lambda s: len(s & base_genres))
    recs = recs[recs['genre_overlap'] > 0 ]
    
    recs =recs.sort_values(['genre_overlap', 'score'], ascending=[False, False])
    
    return recs[['Series_Title', 'Released_Year', 'IMDB_Rating', 'No_of_Votes', 'genres', 'genre_overlap', 'score']].head(top_n)
    

In [22]:
recommend_similar("fight club")

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,No_of_Votes,genres,genre_overlap,score
0,The Shawshank Redemption,1994.0,9.3,2343110,[Drama],1,136.403004
2,The Dark Knight,2008.0,9.0,2303232,"[Action, Crime, Drama]",1,131.848415
1,The Godfather,1972.0,9.2,1620367,"[Crime, Drama]",1,131.543102
6,Pulp Fiction,1994.0,8.9,1826188,"[Crime, Drama]",1,128.317897
5,The Lord of the Rings: The Return of the King,2003.0,8.9,1642758,"[Action, Adventure, Drama]",1,127.375795


In [23]:
def find_title(query, n=5):
    matches = df[df['Series_Title'].str.contains(query, case=False, na=False)]
    return matches[['Series_Title', 'Released_Year', 'IMDB_Rating', 'No_of_Votes']].head(n)

In [24]:
find_title("fight")

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,No_of_Votes
9,Fight Club,1999.0,8.8,1854740
614,The Fighter,2010.0,7.8,340584


## Demo: Example Recommendations & Observations

In [25]:
recommend_movies('Mystery')

Unnamed: 0,Series_Title,IMDB_Rating,No_of_Votes,score
69,Memento,8.4,1125712,117.044981
145,Shutter Island,8.2,1129894,114.288602
81,Rear Window,8.4,444074,109.231471
119,Vertigo,8.3,364368,106.289133
393,Twelve Monkeys,8.0,578443,106.144762


The system prioritizes highly rated and popular mystery movies.

In [26]:
recommend_by_genres(['Thriller', 'Mystery'])

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,No_of_Votes,genres,genre_overlap,score
41,The Usual Suspects,1995.0,8.5,991208,"[Crime, Mystery, Thriller]",2,117.356777
69,Memento,2000.0,8.4,1125712,"[Mystery, Thriller]",2,117.044981
145,Shutter Island,2010.0,8.2,1129894,"[Mystery, Thriller]",2,114.288602
49,Psycho,1960.0,8.5,604211,"[Horror, Mystery, Thriller]",2,113.149269
248,The Sixth Sense,1999.0,8.1,911573,"[Drama, Mystery, Thriller]",2,111.155708


Movies with stronger genre overlap are ranked higher, producing more relevant recommendations.

In [27]:
recommend_similar('Fight club')

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,No_of_Votes,genres,genre_overlap,score
0,The Shawshank Redemption,1994.0,9.3,2343110,[Drama],1,136.403004
2,The Dark Knight,2008.0,9.0,2303232,"[Action, Crime, Drama]",1,131.848415
1,The Godfather,1972.0,9.2,1620367,"[Crime, Drama]",1,131.543102
6,Pulp Fiction,1994.0,8.9,1826188,"[Crime, Drama]",1,128.317897
5,The Lord of the Rings: The Return of the King,2003.0,8.9,1642758,"[Action, Adventure, Drama]",1,127.375795


 Similar movies share overlapping genres and strong audience engagement.

In [28]:
find_title("fight")

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,No_of_Votes
9,Fight Club,1999.0,8.8,1854740
614,The Fighter,2010.0,7.8,340584


## Project Summary

This project implements a content-based movie recommendation system using the IMDB Top 1000 Movies dataset.

Movies are recommended based on genre similarity, IMDB rating, and audience popularity.
The system combines these signals to rank movies that best match user preferences.

All recommendations are generated dynamically and are fully explainable.
This project focuses on building clear and reliable recommendation logic rather than using machine learning models.