# 📊 IMDB Top 1000 Movies - Exploratory Data Analysis (EDA)

## 📌 Project Overview
This notebook explores the **IMDB Top 1000 Movies dataset** to analyze:
- Movie ratings distribution
- Most popular genres
- Revenue trends over time
- Relationship between IMDB ratings and box office performance


## 📌 Step 1: Load the Dataset

In [None]:

import pandas as pd

# Load dataset
file_path = "imdb_top_1000.csv"
df = pd.read_csv(file_path)

# Display first few rows
df.head()


## 📌 Step 2: Data Cleaning

In [None]:

# Convert Released_Year to integer format
df["Released_Year"] = df["Released_Year"].str.extract('(\d+)').astype(float).astype('Int64')

# Convert Gross revenue to numeric format
df["Gross"] = df["Gross"].str.replace(",", "").astype(float)

# Fill missing Certificate values with "Unknown"
df["Certificate"] = df["Certificate"].fillna("Unknown")

# Drop rows where Meta_score and Gross are missing
df.dropna(subset=["Meta_score", "Gross"], inplace=True)

# Verify cleaned data
df.info()


## 📌 Step 3: Exploratory Data Analysis (EDA)

### 🎬 IMDB Ratings Distribution

In [None]:

import matplotlib.pyplot as plt

# Histogram for IMDB Ratings
plt.figure(figsize=(8, 5))
plt.hist(df["IMDB_Rating"], bins=10, color="skyblue", edgecolor="black")
plt.xlabel("IMDB Rating")
plt.ylabel("Number of Movies")
plt.title("Distribution of IMDB Ratings")
plt.grid(axis="y", linestyle="--", alpha=0.7)
plt.show()


### 🎭 Most Common Movie Genres

In [None]:

# Bar chart for most common genres
plt.figure(figsize=(10, 5))
df["Genre"].value_counts().head(10).plot(kind="bar", color="orange", edgecolor="black")
plt.xlabel("Genre")
plt.ylabel("Count")
plt.title("Top 10 Most Common Movie Genres")
plt.xticks(rotation=45)
plt.grid(axis="y", linestyle="--", alpha=0.7)
plt.show()


### 💰 IMDB Ratings vs. Box Office Revenue

In [None]:

# Scatter plot: IMDB Rating vs Gross Revenue
plt.figure(figsize=(8, 5))
plt.scatter(df["IMDB_Rating"], df["Gross"], alpha=0.6, color="green")
plt.xlabel("IMDB Rating")
plt.ylabel("Box Office Revenue ($)")
plt.title("IMDB Ratings vs. Box Office Revenue")
plt.grid(alpha=0.5)
plt.show()


## 📌 Step 4: Key Findings & Conclusion


### 🔹 Key Findings:
- Most movies are rated **between 7.0 and 8.0**, with few reaching **9.0+**.
- **Drama** is the most dominant genre, often mixed with other categories like **Romance & Comedy**.
- **Movies rated around 8.0 perform better at the box office** than those rated 9+.
- **Steven Spielberg is the most frequently featured director** in this dataset (13 movies).
- **Box office revenue has significantly increased post-2010**, with **2018 having the highest average revenue per movie (~$216.8M)**.

### 📌 Next Steps:
- Extend analysis to compare IMDB ratings with **Rotten Tomatoes scores**.
- Apply **Machine Learning** to predict **box office success based on movie features**.
