##**IMdb Movie**

**IMDb (Internet Movie Database)** is an online platform that provides information related to movies, TV shows, actors, directors, release dates, ratings, reviews, and other entertainment industry details. It is one of the most popular and trusted sources for film-related data worldwide.

### **Description of Libraries**

**Requests :**

The requests library is used to send HTTP requests to websites and retrieve web page content for scraping.

**LXML :**

lxml is a fast and efficient HTML/XML parser used to parse and process the structure of web pages.

**BeautifulSoup (bs4) :**

BeautifulSoup helps in parsing HTML content and extracting specific data such as movie names, ratings, and release years from web pages.

**CSV :**

The csv module is used to store the extracted data into a CSV (Comma-Separated Values) file format.

**Pandas :**

pandas is used to organize the scraped data into a structured format called a DataFrame and export it to CSV for analysis.

In [1]:
# import web scraping library
import requests
import lxml
from bs4 import BeautifulSoup as bs
import csv
import pandas as pd

In [2]:
url = "https://www.imdb.com/search/title/?groups=top_100&sort=num_votes,desc"

In [3]:
header = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/145.0.0.0 Safari/537.36"}

In [4]:
# requests to url to server

response = requests.get(url, headers = header)

if response.status_code == 200:
  html_content = response.text
else:
  print("error occured")

In [5]:
html_content



In [6]:
soup = bs(html_content, "lxml")

In [7]:
soup

<!DOCTYPE html>
<html lang="en-US" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://opengraphprotocol.org/schema/"><head><meta charset="utf-8"/><meta content="width=device-width" name="viewport"/><script>if(typeof uet === 'function'){ uet('bb', 'LoadTitle', {wb: 1}); }</script><script>window.addEventListener('load', (event) => {
        if (typeof window.csa !== 'undefined' && typeof window.csa === 'function') {
            var csaLatencyPlugin = window.csa('Content', {
                element: {
                    slotId: 'LoadTitle',
                    type: 'service-call'
                }
            });
            csaLatencyPlugin('mark', 'clickToBodyBegin', 1771590814998);
        }
    })</script><title>Advanced search</title><meta content="" data-id="main" name="description"/><meta content="0cadf7898134e79b" name="google-site-verification"/><meta content="C1DACEF2769068C0B0D2687C9E5105FA" name="msvalidate.01"/><meta content="max-image-preview:large" name="robots

In [8]:
soup.prettify()



In [9]:
movie_name = soup.find_all("h3", class_ = "ipc-title__text") # find_all get multiple of same class

In [10]:
movie_name

[<h3 class="ipc-title__text">1. The Shawshank Redemption</h3>,
 <h3 class="ipc-title__text">2. The Dark Knight</h3>,
 <h3 class="ipc-title__text">3. Inception</h3>,
 <h3 class="ipc-title__text">4. Fight Club</h3>,
 <h3 class="ipc-title__text">5. Interstellar</h3>,
 <h3 class="ipc-title__text">6. Forrest Gump</h3>,
 <h3 class="ipc-title__text">7. Pulp Fiction</h3>,
 <h3 class="ipc-title__text">8. The Matrix</h3>,
 <h3 class="ipc-title__text">9. The Godfather</h3>,
 <h3 class="ipc-title__text">10. The Lord of the Rings: The Fellowship of the Ring</h3>,
 <h3 class="ipc-title__text">11. The Lord of the Rings: The Return of the King</h3>,
 <h3 class="ipc-title__text">12. Se7en</h3>,
 <h3 class="ipc-title__text">13. The Dark Knight Rises</h3>,
 <h3 class="ipc-title__text">14. The Lord of the Rings: The Two Towers</h3>,
 <h3 class="ipc-title__text">15. Django Unchained</h3>,
 <h3 class="ipc-title__text">16. Gladiator</h3>,
 <h3 class="ipc-title__text">17. Inglourious Basterds</h3>,
 <h3 class

In [11]:
movie_names = []

for i in range(20):

    text = movie_name[i].text.strip()

    name = text.split(". ")[1]  # split at first ". "
    movie_names.append(name)

print(movie_names)

['The Shawshank Redemption', 'The Dark Knight', 'Inception', 'Fight Club', 'Interstellar', 'Forrest Gump', 'Pulp Fiction', 'The Matrix', 'The Godfather', 'The Lord of the Rings: The Fellowship of the Ring', 'The Lord of the Rings: The Return of the King', 'Se7en', 'The Dark Knight Rises', 'The Lord of the Rings: The Two Towers', 'Django Unchained', 'Gladiator', 'Inglourious Basterds', 'The Silence of the Lambs', 'Joker', 'Saving Private Ryan']


In [12]:
movie_year = soup.find_all("span", class_ = "sc-a55f6282-6 iMumIM dli-title-metadata-item")

In [13]:
movie_year

[<span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">1994</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">2h 22m</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">R</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">2008</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">2h 32m</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">PG-13</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">2010</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">2h 28m</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">PG-13</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">1999</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">2h 19m</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">R</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-metadata-item">2014</span>,
 <span class="sc-a55f6282-6 iMumIM dli-title-me

In [14]:
movie_years = []

for i in range(20):
    year = movie_year[i*3 + 0].text.strip()
    movie_years.append(year)

print(movie_years)

['1994', '2008', '2010', '1999', '2014', '1994', '1994', '1999', '1972', '2001', '2003', '1995', '2012', '2002', '2012', '2000', '2009', '1991', '2019', '1998']


In [15]:
movie_rating = soup.find_all("span", class_ = "ipc-rating-star--rating")

In [16]:
movie_rating

[<span class="ipc-rating-star--rating">9.3</span>,
 <span class="ipc-rating-star--rating">9.1</span>,
 <span class="ipc-rating-star--rating">8.8</span>,
 <span class="ipc-rating-star--rating">8.8</span>,
 <span class="ipc-rating-star--rating">8.7</span>,
 <span class="ipc-rating-star--rating">8.8</span>,
 <span class="ipc-rating-star--rating">8.8</span>,
 <span class="ipc-rating-star--rating">8.7</span>,
 <span class="ipc-rating-star--rating">9.2</span>,
 <span class="ipc-rating-star--rating">8.9</span>,
 <span class="ipc-rating-star--rating">9.0</span>,
 <span class="ipc-rating-star--rating">8.6</span>,
 <span class="ipc-rating-star--rating">8.4</span>,
 <span class="ipc-rating-star--rating">8.8</span>,
 <span class="ipc-rating-star--rating">8.5</span>,
 <span class="ipc-rating-star--rating">8.5</span>,
 <span class="ipc-rating-star--rating">8.4</span>,
 <span class="ipc-rating-star--rating">8.6</span>,
 <span class="ipc-rating-star--rating">8.3</span>,
 <span class="ipc-rating-star--

In [17]:
ratings = []

for i in range(20):
  rate = movie_rating[i].text.strip("")
  ratings.append(rate)

print(ratings)

['9.3', '9.1', '8.8', '8.8', '8.7', '8.8', '8.8', '8.7', '9.2', '8.9', '9.0', '8.6', '8.4', '8.8', '8.5', '8.5', '8.4', '8.6', '8.3', '8.6']


In [18]:
df = pd.DataFrame({
    "Movie Name": movie_names,
    "Year": movie_years,
    "Rating": ratings })

In [19]:
df

Unnamed: 0,Movie Name,Year,Rating
0,The Shawshank Redemption,1994,9.3
1,The Dark Knight,2008,9.1
2,Inception,2010,8.8
3,Fight Club,1999,8.8
4,Interstellar,2014,8.7
5,Forrest Gump,1994,8.8
6,Pulp Fiction,1994,8.8
7,The Matrix,1999,8.7
8,The Godfather,1972,9.2
9,The Lord of the Rings: The Fellowship of the Ring,2001,8.9


In [20]:
data = df.to_csv("Top 20 imdb movies.csv", index=False)

In [21]:
data

## **Conclusion**

This project demonstrates how web scraping can be used to extract valuable movie data such as names, release years, and ratings from the IMDb website using Python and BeautifulSoup. The collected data is then organized into a structured format using Pandas and exported to a CSV file for easy analysis.