# **IMDB TOP 10 MOVIES SCRAPER SCRIPT**


This project scrapes the top 10 movies from IMDb’s Top Rated Movies chart using Python, BeautifulSoup, and pandas.

📌 **Goal:** Demonstrate entry-level web scraping, data cleaning, and basic data presentation.

🔧 **Tools Used:**
- requests
- BeautifulSoup
- pandas


Step-1 : import the requisite libraries, like requests, BeautifulSoup, pandas

In [6]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

Step-2: Declare the variables for usage like, URL, NUMBER OF ROWS TO FETCH, and the OUTPUT FILE NAME

In [None]:
# Variable declaration
# IMDb Top 250 URL (as of 2025)
url = "https://www.imdb.com/chart/top/"
# no. of rows to fetch (useful for testing purposes)
NUM_ROWS=10
# output file name
FILE_NAME=f"imdb_top_{NUM_ROWS}_updated.csv"

Step-3: Fetch the data from the website after probing it for a valid response. If response is invalid, stop. Else, fetch the response and parse it using BeautifulSoup

In [7]:
# Headers to prevent bot blocking
headers = {
    "Accept-Language": "en-US,en;q=0.5",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
}

# Send the request
response = requests.get(url, headers=headers)

# check response status code to see if we get the correct webpage
if response.status_code != 200:
    print("❌ Failed to retrieve the page. Status code:", response.status_code)
    exit()

# Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

Step-4: From the output of the BeautifulSoup command, we have to find all the list items (**li**) with class name as **ipc-metadata-list-summary-item**

In [8]:
# Find all movie containers
movie_containers = soup.find_all("li", class_="ipc-metadata-list-summary-item")

if movie_containers is None:
    print("❌ No movie containers found on the page.")
    exit()

Step-5: Now we have to loop in the set of items present in our movie container and based on the HTML structure, find the requisite html elements containing our required fields, and create dictionary for each item and then store it in our movies data list, as shown below for our sample.

In [9]:
# Prepare list to hold extracted data
movies_data = []

# Loop through each movie item
for index, item in enumerate(movie_containers[:NUM_ROWS], start=1):
    title_tag = item.find("h3", class_="ipc-title__text")
    rating_tag = item.find("span", class_="ipc-rating-star")
    year_tag = item.find("span", class_="cli-title-metadata-item")  # Usually the first metadata span

    if title_tag and rating_tag and year_tag:
        # Clean title (remove rank number)
        raw_title = title_tag.get_text(strip=True)
        title = raw_title.split('. ', 1)[-1]

        rating = rating_tag.get_text(strip=True)
        year = year_tag.get_text(strip=True).strip("()")

        movies_data.append({
            "Rank": index,
            "Title": title,
            "Year": year,
            "Rating": rating
        })

Step-6: Lastly, we have to convert the movies data list to a dataframe and export it to csv file for further custom operations.

In [10]:
# Convert the data to DataFrame for easier view
df = pd.DataFrame(movies_data)

# convert dataframe to csv file
df.to_csv(FILE_NAME, index=False)

print("✅ Scraped", len(df), "movies successfully.")
print(f"📁 Saved to {FILE_NAME}")


Unnamed: 0,Rank,Title,Year,Rating
0,1,The Shawshank Redemption,1994,9.3(3M)
1,2,The Godfather,1972,9.2(2.1M)
2,3,The Dark Knight,2008,9.0(3M)
3,4,The Godfather Part II,1974,9.0(1.4M)
4,5,12 Angry Men,1957,9.0(920K)
5,6,The Lord of the Rings: The Return of the King,2003,9.0(2.1M)
6,7,Schindler's List,1993,9.0(1.5M)
7,8,Pulp Fiction,1994,8.9(2.3M)
8,9,The Lord of the Rings: The Fellowship of the Ring,2001,8.9(2.1M)
9,10,"The Good, the Bad and the Ugly",1966,8.8(849K)


✅ Scraped 10 movies successfully.
📁 Saved to imdb_top_250_updated.csv


## What I Learned

- How to send and parse HTTP requests using `requests` and `BeautifulSoup`.
- How to navigate real-world HTML structures for data scraping.
- How to clean and structure scraped data with pandas.
- Exporting the data to a csv for further usage.

This project helped me understand the fundamentals of web scraping while keeping the code simple and clean.


## ⚖️ Disclaimer

*This project is for educational and portfolio purposes only. All movie data belongs to [IMDb](https://www.imdb.com).  
© 2025 Shivam Garg. Feel free to reuse this code with proper credit.*