# 📚 Importing Libraries  

Before diving into data analysis, let’s bring in the power of Python libraries 🐍✨  

```python
# 🔧 Data Handling
import pandas as pd
import numpy as np

# 📊 Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# 🎨 Set Styles
sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (10, 6)


In [127]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# 🎬 Top Rated Movies Dataset Upload Guide  

Want to add your dataset? Follow these simple steps and bring the magic of movies into this repo! 🍿  

---

## 📁 Step 1: Place Your Dataset  
Copy your dataset (e.g., `movie_dataset.csv`) into this project folder:  



In [128]:
df = pd.read_csv('imdb-top-rated-movies-user-rated.csv')

In [129]:
# Shape of dataset (rows, columns)
print(f"Dataset Shape: {df.shape}")
df.shape
print('-------------------------------------------')

# Data types and null values
print(f"Dataset Info:")
df.info()
print('---------------------------------------------')

# Summary statistics
print(f"Summary Statistics:")
df.describe()


Dataset Shape: (950, 14)
-------------------------------------------
Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Rank             950 non-null    int64  
 1   Title            950 non-null    object 
 2   IMDb Rating      950 non-null    float64
 3   Votes            950 non-null    object 
 4   Poster URL       950 non-null    object 
 5   Video URL        918 non-null    object 
 6   Meta Score       793 non-null    float64
 7   Tags             950 non-null    object 
 8   Director         950 non-null    object 
 9   Description      950 non-null    object 
 10  Writers          950 non-null    object 
 11  Stars            949 non-null    object 
 12  Summary          298 non-null    object 
 13  Worldwide Gross  53 non-null     object 
dtypes: float64(2), int64(1), object(11)
memory usage: 104.0+ KB
-----------

Unnamed: 0,Rank,IMDb Rating,Meta Score
count,950.0,950.0,793.0
mean,475.5,7.944632,79.142497
std,274.385677,0.217292,11.864497
min,1.0,7.6,30.0
25%,238.25,7.8,72.0
50%,475.5,7.9,80.0
75%,712.75,8.1,88.0
max,950.0,8.5,100.0


In [130]:
df.head()

Unnamed: 0,Rank,Title,IMDb Rating,Votes,Poster URL,Video URL,Meta Score,Tags,Director,Description,Writers,Stars,Summary,Worldwide Gross
0,1,Once Upon a Time... in Hollywood,7.6,927K,https://www.imdb.com/title/tt7131622/mediaview...,https://imdb-video.media-imdb.com/vi1385741849...,84.0,"""Period Drama, Showbiz Drama, Comedy, Drama""",Quentin Tarantino,"""As Hollywood's Golden Age is winding down dur...",Quentin Tarantino,"""Leonardo DiCaprio, Brad Pitt, Margot Robbie""","""Reviewers say 'Once Upon a Time in Hollywood'...",-
1,2,Mission: Impossible - Dead Reckoning Part One,7.6,311K,https://www.imdb.com/title/tt9603212/mediaview...,https://imdb-video.media-imdb.com/vi3500918553...,81.0,"""Action Epic, Adventure Epic, Spy, Action, Adv...",Christopher McQuarrie,Ethan Hunt and his IMF team must track down a ...,"""Bruce Geller, Christopher McQuarrie, Erik Jen...","""Tom Cruise, Hayley Atwell, Ving Rhames""","""Reviewers say 'Mission: Impossible - Dead Rec...",-
2,3,John Wick: Chapter 4,7.6,392K,https://www.imdb.com/title/tt10366206/mediavie...,https://imdb-video.media-imdb.com/vi289916185/...,78.0,"""Action Epic, Gun Fu, One,Person Army Action, ...",Chad Stahelski,"""John Wick uncovers a path to defeating The Hi...","""Shay Hatten, Michael Finch, Derek Kolstad""","""Keanu Reeves, Laurence Fishburne, George Geor...","""Reviewers say 'John Wick: Chapter 4' is laude...",-
3,4,Watchmen,7.6,603K,https://www.imdb.com/title/tt0409459/mediaview...,https://imdb-video.media-imdb.com/vi240565017/...,56.0,"""Dystopian Sci,Fi, Superhero, Action, Drama, M...",Zack Snyder,"""In a version of 1985 where superheroes exist-...","""Dave Gibbons, David Hayter, Alex Tse""","""Jackie Earle Haley, Patrick Wilson, Carla Gug...","""Reviewers say 'Watchmen' is acclaimed for its...",-
4,5,The Fifth Element,7.6,533K,https://www.imdb.com/title/tt0119116/mediaview...,https://imdb-video.media-imdb.com/vi854720793/...,52.0,"""Sci,Fi Epic, Space Sci,Fi, Action, Adventure,...",Luc Besson,"""In the colorful future- a cab driver unwittin...","""Luc Besson, Robert Mark Kamen""","""Bruce Willis, Milla Jovovich, Gary Oldman""",-,-


## Data Preprocessing: Transforming 'Votes'
The 'Votes' column is currently stored as a string, with values like '1.2K' or '5.6M'. To perform numerical analysis, these need to be converted into a consistent integer format. We'll convert 'K' to thousands and 'M' to millions.

In [131]:
# df['Votes'] = df['Votes'].apply(lambda x: float(x.split('K')[0] * 1000 )if "K" in x 
#                                 else float(x.split("M")[0]) * 1000000)

In [132]:
def convert_votes(v):
    if isinstance(v, str):
        if 'K' in v:
            return float(v.replace('K', '')) * 1000
        elif 'M' in v:
            return float(v.replace('M', '')) * 100000
        else:
            return v
    else:
        return v
        
df['Votes'] = df['Votes'].apply(convert_votes)

In [133]:
df['Votes'].dtype

dtype('float64')

In [134]:
df['Votes']

0      927000.0
1      311000.0
2      392000.0
3      603000.0
4      533000.0
         ...   
945    361000.0
946    120000.0
947    982000.0
948    993000.0
949    120000.0
Name: Votes, Length: 950, dtype: float64

In [135]:
df.columns

Index(['Rank', 'Title', 'IMDb Rating', 'Votes', 'Poster URL', 'Video URL',
       'Meta Score', 'Tags', 'Director', 'Description', 'Writers', 'Stars',
       'Summary', 'Worldwide Gross'],
      dtype='object')

In [136]:
# lets check null values in percentage
df.isnull().sum() / df.shape[0] * 100

Rank                0.000000
Title               0.000000
IMDb Rating         0.000000
Votes               0.000000
Poster URL          0.000000
Video URL           3.368421
Meta Score         16.526316
Tags                0.000000
Director            0.000000
Description         0.000000
Writers             0.000000
Stars               0.105263
Summary            68.631579
Worldwide Gross    94.421053
dtype: float64

## 🧹 Removing Columns with More than 50% Null Values

In this dataset, some columns contain a high number of missing values.  
To keep the dataset clean and useful, we remove any column that has **more than 50% null values**.

```python
# Calculate percentage of missing values
null_percentage = df.isnull().mean() * 100

# Drop columns with more than 50% null values
df = df.drop(null_percentage[null_percentage > 50].index, axis=1)

# Show remaining columns
df.info()


In [None]:
df.drop(['Worldwide Gross', 'Summary'], axis=1, inplace=True, errors='ignore')

In [138]:
df.head()

Unnamed: 0,Rank,Title,IMDb Rating,Votes,Poster URL,Video URL,Meta Score,Tags,Director,Description,Writers,Stars
0,1,Once Upon a Time... in Hollywood,7.6,927000.0,https://www.imdb.com/title/tt7131622/mediaview...,https://imdb-video.media-imdb.com/vi1385741849...,84.0,"""Period Drama, Showbiz Drama, Comedy, Drama""",Quentin Tarantino,"""As Hollywood's Golden Age is winding down dur...",Quentin Tarantino,"""Leonardo DiCaprio, Brad Pitt, Margot Robbie"""
1,2,Mission: Impossible - Dead Reckoning Part One,7.6,311000.0,https://www.imdb.com/title/tt9603212/mediaview...,https://imdb-video.media-imdb.com/vi3500918553...,81.0,"""Action Epic, Adventure Epic, Spy, Action, Adv...",Christopher McQuarrie,Ethan Hunt and his IMF team must track down a ...,"""Bruce Geller, Christopher McQuarrie, Erik Jen...","""Tom Cruise, Hayley Atwell, Ving Rhames"""
2,3,John Wick: Chapter 4,7.6,392000.0,https://www.imdb.com/title/tt10366206/mediavie...,https://imdb-video.media-imdb.com/vi289916185/...,78.0,"""Action Epic, Gun Fu, One,Person Army Action, ...",Chad Stahelski,"""John Wick uncovers a path to defeating The Hi...","""Shay Hatten, Michael Finch, Derek Kolstad""","""Keanu Reeves, Laurence Fishburne, George Geor..."
3,4,Watchmen,7.6,603000.0,https://www.imdb.com/title/tt0409459/mediaview...,https://imdb-video.media-imdb.com/vi240565017/...,56.0,"""Dystopian Sci,Fi, Superhero, Action, Drama, M...",Zack Snyder,"""In a version of 1985 where superheroes exist-...","""Dave Gibbons, David Hayter, Alex Tse""","""Jackie Earle Haley, Patrick Wilson, Carla Gug..."
4,5,The Fifth Element,7.6,533000.0,https://www.imdb.com/title/tt0119116/mediaview...,https://imdb-video.media-imdb.com/vi854720793/...,52.0,"""Sci,Fi Epic, Space Sci,Fi, Action, Adventure,...",Luc Besson,"""In the colorful future- a cab driver unwittin...","""Luc Besson, Robert Mark Kamen""","""Bruce Willis, Milla Jovovich, Gary Oldman"""


In [139]:
df.columns


Index(['Rank', 'Title', 'IMDb Rating', 'Votes', 'Poster URL', 'Video URL',
       'Meta Score', 'Tags', 'Director', 'Description', 'Writers', 'Stars'],
      dtype='object')