Q1. Write a Python program to scrape all available books from the website
(https://books.toscrape.com/) Books to Scrape – a live site built for practicing scraping (safe,
legal, no anti-bot). For each book, extract the following details:
1. Title
2. Price
3. Availability (In stock / Out of stock)
4. Star Rating (One, Two, Three, Four, Five)
Store the scraped results into a Pandas DataFrame and export them to a CSV file named
books.csv.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

base_url = "https://books.toscrape.com/catalogue/page-{}.html"
books = []

for page in range(1, 51):  # 50 pages
    url = base_url.format(page)
    res = requests.get(url)
    if res.status_code != 200:
        break
    soup = BeautifulSoup(res.text, "html.parser")
    items = soup.find_all("article", class_="product_pod")

    for item in items:
        title = item.h3.a["title"]
        price = item.find("p", class_="price_color").text.strip()
        availability = item.find("p", class_="instock availability").text.strip()
        star_class = item.p["class"][1]  # e.g. "Three"
        books.append([title, price, availability, star_class])

df_books = pd.DataFrame(books, columns=["Title", "Price", "Availability", "Star Rating"])
df_books.to_csv("books.csv", index=False)
df_books.head()


Unnamed: 0,Title,Price,Availability,Star Rating
0,A Light in the Attic,Â£51.77,In stock,Three
1,Tipping the Velvet,Â£53.74,In stock,One
2,Soumission,Â£50.10,In stock,One
3,Sharp Objects,Â£47.82,In stock,Four
4,Sapiens: A Brief History of Humankind,Â£54.23,In stock,Five


Q2. Write a Python program to scrape the IMDB Top 250 Movies list
(https://www.imdb.com/chart/top/) . For each movie, extract the following details:
1. Rank (1–250)
2. Movie Title
3. Year of Release
4. IMDB Rating

In [4]:
!apt-get update
!apt-get install -y chromium-browser chromium-chromedriver

0% [Working]            Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
0% [Waiting for headers] [1 InRelease 9,826 B/129 kB 8%] [Connected to cloud.r-                                                                               Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease
0% [1 InRelease 129 kB/129 kB 100%] [Connected to cloud.r-project.org (18.239.1                                                                               Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
0% [3 InRelease 127 kB/128 kB 99%] [Connected to cloud.r-project.org (18.239.180% [Waiting for headers] [Connected to cloud.r-project.org (18.239.18.73)] [Con                                                                               Get:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
0% [4 InRelease 15.6 kB/127 kB 12%] [Waiting for headers] [Connecting to r2u.st                                                      

In [7]:
!pip install selenium beautifulsoup4

Collecting selenium
  Downloading selenium-4.35.0-py3-none-any.whl.metadata (7.4 kB)
Collecting trio~=0.30.0 (from selenium)
  Downloading trio-0.30.0-py3-none-any.whl.metadata (8.5 kB)
Collecting trio-websocket~=0.12.2 (from selenium)
  Downloading trio_websocket-0.12.2-py3-none-any.whl.metadata (5.1 kB)
Collecting typing_extensions~=4.14.0 (from selenium)
  Downloading typing_extensions-4.14.1-py3-none-any.whl.metadata (3.0 kB)
Collecting outcome (from trio~=0.30.0->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl.metadata (2.6 kB)
Collecting wsproto>=0.14 (from trio-websocket~=0.12.2->selenium)
  Downloading wsproto-1.2.0-py3-none-any.whl.metadata (5.6 kB)
Downloading selenium-4.35.0-py3-none-any.whl (9.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.6/9.6 MB[0m [31m69.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading trio-0.30.0-py3-none-any.whl (499 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m499.2/499.2 kB[0m [31m33.

In [10]:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time

options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Chrome(options=options)
driver.get("https://www.imdb.com/chart/top/")
time.sleep(3)

soup = BeautifulSoup(driver.page_source, "html.parser")
driver.quit()

movies = []
rows = soup.select("li.ipc-metadata-list-summary-item")

for rank, row in enumerate(rows, start=1):
    title_tag = row.select_one("h3")
    title = title_tag.text.replace(str(rank)+".", "").strip()
    year = row.select_one("span.ipc-title__subtext").text.strip("()")
    rating = row.select_one("span.ipc-rating-star--rating").text
    movies.append([rank, title, year, rating])

df_imdb = pd.DataFrame(movies, columns=["Rank", "Title", "Year", "Rating"])
df_imdb.to_csv("imdb_top250.csv", index=False)
df_imdb.head()

Unnamed: 0,Rank,Title,Year,Rating


Q3. Write a Python program to scrape the weather information for top world cities from the
given website (https://www.timeanddate.com/weather/) . For each city, extract the following
details:
1. City Name
2. Temperature
3. Weather Condition (e.g., Clear, Cloudy, Rainy, etc.)

In [12]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.timeanddate.com/weather/"
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")

cities = []
rows = soup.select("table tbody tr")

for row in rows:
    city_tag = row.find("a")
    if not city_tag:
        continue
    city = city_tag.text
    temp = row.find_all("td")[1].text.strip()
    condition = row.find_all("td")[2].text.strip()
    cities.append([city, temp, condition])

df_weather = pd.DataFrame(cities, columns=["City", "Temperature", "Condition"])
df_weather.to_csv("weather.csv", index=False)
df_weather.head()

Unnamed: 0,City,Temperature,Condition
