### This code will scrape data from IMDb TV Shows page and store the information in lists

### The data includes: Title, Year, Rating, Genre, Runtime, Votes, and Stars

### The code will loop through 20 pages, each containing 50 TV shows

### BeautifulSoup library is used to parse HTML content

## Import necessary libraries

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

### Initialize empty lists to store data

In [3]:
titles = []
years = []
ratings = []
genres=[]
runtimes=[]
votes_list=[]
stars_list=[]

### Define the base URL of the IMDb TV Shows page

In [4]:
base_url = 'https://www.imdb.com/search/title?genres=drama&start='

### Loop through 20 pages of TV shows, each page containing 50 TV shows

In [5]:
for page in range(1, 21):
    # Construct the URL of the current page
    url = base_url + str(page)
    
    # Send a request to the URL and get the response
    response = requests.get(url)

    # Parse the HTML content of the response using BeautifulSoup
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Find all div elements with class='lister-item-content' which contain information of each TV show
    tv_show_list = soup.find_all('div', class_='lister-item-content')
    
    # Loop through all TV shows on the current page and extract necessary information
    for tv_show in tv_show_list:
        # Extract the title of the TV show
        title = tv_show.h3.a.text
        titles.append(title)

        # Extract the year of release of the TV show
        year = tv_show.find('span', class_='lister-item-year').text
        years.append(year)
        
         # Extract the rating of the TV show
        rating_element = tv_show.find('div', {'class': 'inline-block ratings-imdb-rating'})
        if rating_element is not None:
            rating = rating_element.strong.text
            ratings.append(rating)
        else:
            ratings.append('N/A')
           
        
        # Extract the genre of the TV show
        genre =tv_show.find('span', class_= 'genre').text.strip()
        genres.append(genre)
        
        
        # Extract the runtime of the TV show
        runtime_element = tv_show.find('span', {'class': 'runtime'})
        if runtime_element is not None:
            runtime = runtime_element.text.strip()
        else:
            runtime = 'N/A'
        runtimes.append(runtime)
        
        # Extract number of votes
        votes_element = tv_show.find('span', {'name': 'nv'})
        if votes_element is not None:
            votes = votes_element['data-value']
        else:
            votes = 'N/A'
        votes_list.append(votes)
        
        
         # Extract the stars of the TV show
        stars_element = tv_show.find('p', class_='')
        if stars_element is not None:
            stars = stars_element.find_all('a')
            star_list = [star.text for star in stars]
        else:
            star_list = 'N/A'
        stars_list.append(star_list)
    

### Create a DataFrame

In [6]:
# Create a dataframe using pandas
df = pd.DataFrame({
    'Title': titles,
    'Year': years,
    'Rating': ratings,
    'Genre': genres,
    'Runtime': runtimes,
    'Votes': votes_list,
    'Stars': stars_list
})

In [7]:
df.head()

Unnamed: 0,Title,Year,Rating,Genre,Runtime,Votes,Stars
0,The Last of Us,(2023– ),8.9,"Action, Adventure, Drama",50 min,346080,"[Pedro Pascal, Bella Ramsey, Anna Torv, Lamar ..."
1,Shadow and Bone,(2021– ),7.6,"Action, Adventure, Drama",60 min,96411,"[Jessie Mei Li, Ben Barnes, Archie Renaux, Fre..."
2,Swarm,(2023– ),7.3,"Comedy, Drama, Thriller",,5041,"[Dominique Fishback, Chloe Bailey, Nirine S. B..."
3,Ted Lasso,(2020– ),8.8,"Comedy, Drama, Sport",30 min,239726,"[Jason Sudeikis, Brett Goldstein, Brendan Hunt..."
4,The Night Agent,(2023– ),7.7,"Action, Drama, Thriller",,9394,"[Gabriel Basso, Luciane Buchanan, Fola Evans-A..."


In [8]:
# save the dataframe to a CSV file
df.to_csv('tv_shows.csv', index=False)

##### The provided code utilizes Python and the Beautiful Soup library to scrape data from IMDb's Top Rated TV Shows page. The code loops through 20 pages, extracting information about the TV shows listed on each page, including the show's title, year of release, rating, genre, runtime, number of votes, and stars. The extracted data is then stored in lists, which are used to create a pandas data frame. Finally, the data frame is exported to a CSV file for further analysis. This code serves as a helpful example for those looking to scrape data from web pages using Python and Beautiful Soup.