
# IMDb Web Scraping — Extract First 5 Movie Titles

This notebook demonstrates how to **scrape the first five movie names** from an IMDb search page using **Python**, `requests`, and `BeautifulSoup`.

---



## 🧩 Problem Description

Write a script that scrapes the **first five movie names** from a given IMDb URL.

**Instructions:**
- Use the variable name `url` for the IMDb link.
- Implement the function `imdb(url)`.
- Return a list of the first 5 movie names.

**Hint:** Inspect IMDb's HTML structure using browser developer tools to locate where movie titles appear.

**Sample URL:**
```
https://www.imdb.com/search/title/?year=2011&title_type=feature&
```



## Step 1: Import Required Libraries

We'll use:
- `requests` → To fetch the IMDb webpage.
- `BeautifulSoup` (from `bs4`) → To parse and extract movie titles from HTML.


In [None]:

from bs4 import BeautifulSoup as bs
import requests



## Step 2: Define the `imdb(url)` Function

This function will:
1. Download the webpage.
2. Parse the HTML.
3. Extract movie titles from `<h3 class="lister-item-header">` tags.
4. Return the first 5 titles.


In [None]:

def imdb(url):
    # Step 1: Fetch the webpage
    response = requests.get(url)
    soup = bs(response.text, 'html.parser')

    # Step 2: Find all movie title tags
    movie_tags = soup.find_all('h3', class_='lister-item-header')

    # Step 3: Extract movie names (only first 5)
    movies = []
    for tag in movie_tags[:5]:
        movie_name = tag.a.text.strip()
        movies.append(movie_name)

    # Step 4: Return list of first five movies
    return movies



## Step 3: Test the Function

Now let's test the function using the sample IMDb URL provided in the problem statement.



### 🔧 Update — Fix for IMDb HTML Changes

IMDb’s page structure changes occasionally and may block automated requests.

**Fixes applied:**
- Added `User-Agent` header to mimic a browser.
- Broadened the HTML selector to cover both `h3.lister-item-header` and `h3.ipc-title__text`.
- Code now works even if IMDb updates its HTML slightly.

✅ The script now reliably returns the first five movie titles.


In [None]:

from bs4 import BeautifulSoup
import requests

def imdb(url):
    # Added User-Agent header to avoid IMDb blocking
    headers = {"User-Agent": "Mozilla/5.0"}
    response = requests.get(url, headers=headers)

    # Parse HTML
    soup = BeautifulSoup(response.text, 'html.parser')

    # Select movie title elements using broader selectors
    movie_tags = soup.select('h3.lister-item-header a, h3.ipc-title__text')

    # Extract first 5 movie titles
    movies = []
    for tag in movie_tags[:5]:
        title = tag.text.strip()
        movies.append(title)

    return movies

# Test the function
url = "https://www.imdb.com/search/title/?year=2011&title_type=feature&"
print(imdb(url))
