# **Web Scraping & Data Handling Challenge**



### **Website:**
JustWatch -  https://www.justwatch.com/in/movies?release_year_from=2000


### **Description:**

JustWatch is a popular platform that allows users to search for movies and TV shows across multiple streaming services like Netflix, Amazon Prime, Hulu, etc. For this assignment, you will be required to scrape movie and TV show data from JustWatch using Selenium, Python, and BeautifulSoup. Extract data from HTML, not by directly calling their APIs. Then, perform data filtering and analysis using Pandas, and finally, save the results to a CSV file.

### **Tasks:**

**1. Web Scraping:**

Use BeautifulSoup to scrape the following data from JustWatch:

   **a. Movie Information:**

      - Movie title
      - Release year
      - Genre
      - IMDb rating
      - Streaming services available (Netflix, Amazon Prime, Hulu, etc.)
      - URL to the movie page on JustWatch

   **b. TV Show Information:**

      - TV show title
      - Release year
      - Genre
      - IMDb rating
      - Streaming services available (Netflix, Amazon Prime, Hulu, etc.)
      - URL to the TV show page on JustWatch

  **c. Scope:**

```
 ` - Scrape data for at least 50 movies and 50 TV shows.
   - You can choose the entry point (e.g., starting with popular movies,
     or a specific genre, etc.) to ensure a diverse dataset.`

```


**2. Data Filtering & Analysis:**

   After scraping the data, use Pandas to perform the following tasks:

   **a. Filter movies and TV shows based on specific criteria:**

   ```
      - Only include movies and TV shows released in the last 2 years (from the current date).
      - Only include movies and TV shows with an IMDb rating of 7 or higher.
```

   **b. Data Analysis:**

   ```
      - Calculate the average IMDb rating for the scraped movies and TV shows.
      - Identify the top 5 genres that have the highest number of available movies and TV shows.
      - Determine the streaming service with the most significant number of offerings.
      
   ```   

**3. Data Export:**

```
   - Dump the filtered and analysed data into a CSV file for further processing and reporting.

   - Keep the CSV file in your Drive Folder and Share the Drive link on the colab while keeping view access with anyone.
```

**Submission:**
```
- Submit a link to your Colab made for the assignment.

- The Colab should contain your Python script (.py format only) with clear
  comments explaining the scraping, filtering, and analysis process.

- Your Code shouldn't have any errors and should be executable at a one go.

- Before Conclusion, Keep your Dataset Drive Link in the Notebook.
```



**Note:**

1. Properly handle errors and exceptions during web scraping to ensure a robust script.

2. Make sure your code is well-structured, easy to understand, and follows Python best practices.

3. The assignment will be evaluated based on the correctness of the scraped data, accuracy of data filtering and analysis, and the overall quality of the Python code.








# **Start The Project**

## **Task 1:- Web Scrapping**

In [None]:
#Installing all necessary labraries
!pip install bs4
!pip install requests

Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Installing collected packages: bs4
Successfully installed bs4-0.0.2


In [None]:
#import all necessary labraries
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import numpy as np

## **Scrapping Movies Data**

In [None]:
# Specifying the URL from which movies related data will be fetched
url='https://www.justwatch.com/in/movies?release_year_from=2000'

# Sending an HTTP GET request to the URL
page=requests.get(url)
# Parsing the HTML content using BeautifulSoup with the 'html.parser'
soup=BeautifulSoup(page.text,'html.parser')
# Printing the prettified HTML content
print(soup.prettify())

<!DOCTYPE html>
<html data-vue-meta="%7B%22dir%22:%7B%22ssr%22:%22ltr%22%7D,%22lang%22:%7B%22ssr%22:%22en%22%7D%7D" data-vue-meta-server-rendered="" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta charset="utf-8" data-vue-meta="ssr"/>
  <meta content="IE=edge" data-vue-meta="ssr" httpequiv="X-UA-Compatible"/>
  <meta content="viewport-fit=cover, width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no" data-vue-meta="ssr" name="viewport"/>
  <meta content="JustWatch" data-vue-meta="ssr" property="og:site_name"/>
  <meta content="794243977319785" data-vue-meta="ssr" property="fb:app_id"/>
  <meta content="/appassets/img/JustWatch_logo_with_claim.png" data-vmid="og:image" data-vue-meta="ssr" property="og:image"/>
  <meta content="606" data-vmid="og:image:width" data-vue-meta="ssr" property="og:image:width"/>
  <meta content="302" data-vmid="og:image:height" data-vue-meta="ssr" pro

## **Fetching Movie URL's**

In [None]:
# Write Your Code here
soup.find_all('a',attrs={'class':'title-list-grid__item--link'})[0]['href']

'/in/movie/laapataa-ladies'

In [None]:


list_link=[]
for i in soup.find_all('a',attrs={'class':'title-list-grid__item--link'}):
  list_link.append('https://www.justwatch.com'+i['href'])

In [None]:
list_link

['https://www.justwatch.com/in/movie/laapataa-ladies',
 'https://www.justwatch.com/in/movie/manjummel-boys',
 'https://www.justwatch.com/in/movie/family-star',
 'https://www.justwatch.com/in/movie/aavesham-2024',
 'https://www.justwatch.com/in/movie/black-magic-2024',
 'https://www.justwatch.com/in/movie/article-370',
 'https://www.justwatch.com/in/movie/madgaon-express',
 'https://www.justwatch.com/in/movie/godzilla-x-kong-the-new-empire',
 'https://www.justwatch.com/in/movie/yodha-2022',
 'https://www.justwatch.com/in/movie/premalu',
 'https://www.justwatch.com/in/movie/the-crew-2024',
 'https://www.justwatch.com/in/movie/dune-part-two',
 'https://www.justwatch.com/in/movie/kung-fu-panda-4',
 'https://www.justwatch.com/in/movie/monkey-man',
 'https://www.justwatch.com/in/movie/oppenheimer',
 'https://www.justwatch.com/in/movie/untitled-shahid-kapoor-kriti-sanon-film',
 'https://www.justwatch.com/in/movie/hanu-man',
 'https://www.justwatch.com/in/movie/anyone-but-you',
 'https://www.j

## **Scrapping Movie Title**

In [None]:
# url='https://www.justwatch.com/in/movies?release_year_from=2000'

# # Sending an HTTP GET request to the URL
# page=requests.get(url)
# # Parsing the HTML content using BeautifulSoup with the 'html.parser'
# soup=BeautifulSoup(page.text,'html.parser')

In [None]:
# soup.find_all('div',attrs={'class':'title-list-grid__item'})[0]['data-title']

In [None]:
# for i in soup.find_all('div',attrs={'class':'title-list-grid__item'}):
  # print(i['data-title'])

In [None]:
soup.find_all('div', attrs = {'data-testid':'titleBlock'})[0].find_all('h1')[0].text.strip()

'Crakk: Jeetega... Toh Jiyegaa'

## **Scrapping release Year**

In [None]:
eval(soup.find_all('div', attrs = {'data-testid':'titleBlock'})[0].find_all('span')[0].text.strip())

2024

## **Scrapping Genres**

In [None]:
# Specifying the URL from which movies related data will be fetched
url = 'https://www.justwatch.com/in/movie/crakk-jeetegaa-toh-jiyegaa'

# Sending an HTTP GET request to the URL
page=requests.get(url)
# Parsing the HTML content using BeautifulSoup with the 'html.parser'
soup=BeautifulSoup(page.text,'html.parser')

In [None]:
# Write Your Code here
info_dict = {}
for i in soup.find_all('div',attrs = {'class':'detail-infos'}):
  if i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text =='Genres':
    info_dict['Genres'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()

In [None]:
info_dict

{'Genres': 'Mystery & Thriller, Action & Adventure, Sport'}

In [None]:
soup.find_all('div',attrs = {'class':'detail-infos'})[2].text

'GenresMystery & Thriller, Action & Adventure, Sport'

## **Scrapping IMBD Rating**

In [None]:

eval(soup.find_all('div',attrs = {'class':'detail-infos'})[1].find_all('span')[0].text.strip().split(' ')[0])

4.9

## **Scrapping Runtime/Duration**

In [None]:
# Write Your Code here
info_dict = {}

for i in soup.find_all('div',attrs = {'class':'detail-infos'}):
  if  i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text =='Runtime':
    info_dict['Runtime'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()

In [None]:
info_dict

{'Runtime': '2h 34min'}

OR

In [None]:
soup.find_all('div',attrs = {'class':'detail-infos'})[3].text

'Runtime2h 34min'

## **Scrapping Age Rating**

In [None]:
# Write Your Code here
info_dict = {}
for i in soup.find_all('div',attrs = {'class':'detail-infos'}):
  if i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text =='Age rating':
    info_dict['Age_rating'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()

In [None]:
info_dict

{'Age_rating': 'UA'}

OR


In [None]:
soup.find_all('div',attrs = {'class':'detail-infos'})[4].text

'Age ratingUA'

## **Fetching Production Countries Details**

In [None]:
# Write Your Code here
info_dict = {}
for i in soup.find_all('div',attrs = {'class':'detail-infos'}):
  if i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text.strip() =='Production country':
    info_dict['Production_country'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()

In [None]:
info_dict

{'Production_country': 'India'}

OR


In [None]:
soup.find_all('div',attrs = {'class':'detail-infos'})[5].text

' Production country India'

## **Now Creating Movies DataFrame**

In [None]:
import time
time.sleep(2)

In [None]:
# Write Your Code here
movie_info_full_data = []
for l in list_link:
  info_dict = {}
  # Specifying the URL from which movies related data will be fetched
  url = l
  # Sending an HTTP GET request to the URL
  page=requests.get(url)
  # Parsing the HTML content using BeautifulSoup with the 'html.parser'
  soup=BeautifulSoup(page.text,'html.parser')
  try:
    info_dict['link'] = l
    info_dict['release_year'] = eval(soup.find_all('div',attrs={'data-testid':'titleBlock'})[0].find_all('span')[0].text.strip().split()[0])
    # info_dict['imdb'] = eval(soup.find_all('div',attrs = {'class':'detail-infos'})[1].find_all('span')[0].text.strip().split(' ')[0])
    for i in soup.find_all('div',attrs = {'class':'detail-infos'}) :
      if i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text =='Genres':
        info_dict['Genres'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()
      elif i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text =='Runtime':
        info_dict['Runtime'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()
      elif i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text =='Age rating':
        info_dict['Age_rating'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()
      elif i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text.strip() =='Production country':
        info_dict['Production_country'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()
      elif i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text.strip() =='Rating':
        info_dict['Rating'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()

    movie_info_full_data.append(info_dict)
    print('Successful:',l)
  except:
    print('Error:',l)
    continue

  time.sleep(2)

Successful: https://www.justwatch.com/in/movie/laapataa-ladies
Successful: https://www.justwatch.com/in/movie/manjummel-boys
Successful: https://www.justwatch.com/in/movie/family-star
Successful: https://www.justwatch.com/in/movie/aavesham-2024
Successful: https://www.justwatch.com/in/movie/black-magic-2024
Successful: https://www.justwatch.com/in/movie/article-370
Successful: https://www.justwatch.com/in/movie/madgaon-express
Successful: https://www.justwatch.com/in/movie/godzilla-x-kong-the-new-empire
Successful: https://www.justwatch.com/in/movie/yodha-2022
Successful: https://www.justwatch.com/in/movie/premalu
Successful: https://www.justwatch.com/in/movie/the-crew-2024
Successful: https://www.justwatch.com/in/movie/dune-part-two
Successful: https://www.justwatch.com/in/movie/kung-fu-panda-4
Successful: https://www.justwatch.com/in/movie/monkey-man
Successful: https://www.justwatch.com/in/movie/oppenheimer
Successful: https://www.justwatch.com/in/movie/untitled-shahid-kapoor-kriti-

In [None]:
movies_data = pd.DataFrame(movie_info_full_data)

In [None]:
movies_data

Unnamed: 0,link,release_year,Rating,Genres,Runtime,Age_rating,Production_country
0,https://www.justwatch.com/in/movie/laapataa-la...,2024,8.5 (23k),"Comedy, Drama",2h 2min,UA,India
1,https://www.justwatch.com/in/movie/manjummel-boys,2024,8.4 (13k),"Mystery & Thriller, Action & Adventure, Drama",2h 15min,UA,India
2,https://www.justwatch.com/in/movie/family-star,2024,5.1 (2k),"Drama, Action & Adventure, Comedy, Romance",2h 39min,,India
3,https://www.justwatch.com/in/movie/aavesham-2024,2024,8.0 (7k),"Action & Adventure, Comedy",2h 38min,,India
4,https://www.justwatch.com/in/movie/black-magic...,2024,6.7 (43k),"Mystery & Thriller, Horror, Drama",2h 12min,UA,India
...,...,...,...,...,...,...,...
91,https://www.justwatch.com/in/movie/dada-2023,2023,8.1 (6k),"Comedy, Romance, Drama, Kids & Family",2h 15min,U,India
92,https://www.justwatch.com/in/movie/interstellar,2014,8.7 (2m),"Science-Fiction, Action & Adventure, Drama",2h 49min,,"United Kingdom, United States"
93,https://www.justwatch.com/in/movie/swatantra-v...,2024,7.7 (14k),"Drama, History",2h 56min,,India
94,https://www.justwatch.com/in/movie/idi-minnal-...,2024,9.5,"Drama, Action & Adventure",2h 11min,,India


## **Scraping TV  Show Data**

In [None]:
# Specifying the URL from which tv show related data will be fetched
tv_url='https://www.justwatch.com/in/tv-shows?release_year_from=2000'
# Sending an HTTP GET request to the URL
page=requests.get(tv_url)
# Parsing the HTML content using BeautifulSoup with the 'html.parser'
soup=BeautifulSoup(page.text,'html.parser')
# Printing the prettified HTML content
print(soup.prettify())

<!DOCTYPE html>
<html data-vue-meta="%7B%22dir%22:%7B%22ssr%22:%22ltr%22%7D,%22lang%22:%7B%22ssr%22:%22en%22%7D%7D" data-vue-meta-server-rendered="" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta charset="utf-8" data-vue-meta="ssr"/>
  <meta content="IE=edge" data-vue-meta="ssr" httpequiv="X-UA-Compatible"/>
  <meta content="viewport-fit=cover, width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no" data-vue-meta="ssr" name="viewport"/>
  <meta content="JustWatch" data-vue-meta="ssr" property="og:site_name"/>
  <meta content="794243977319785" data-vue-meta="ssr" property="fb:app_id"/>
  <meta content="/appassets/img/JustWatch_logo_with_claim.png" data-vmid="og:image" data-vue-meta="ssr" property="og:image"/>
  <meta content="606" data-vmid="og:image:width" data-vue-meta="ssr" property="og:image:width"/>
  <meta content="302" data-vmid="og:image:height" data-vue-meta="ssr" pro

## **Fetching Tv shows Url details**

In [None]:
# Write Your Code here
soup.find_all('div',attrs={'class':'title-list-grid__item'})[0].find_all('a')[0]['href']


'/in/tv-show/shogun-2024'

In [None]:
list_links=[]
for i in soup.find_all('div',attrs={'class':'title-list-grid__item'}):
  list_links.append('https://www.justwatch.com'+i.find_all('a')[0]['href'])

In [None]:
list_links

['https://www.justwatch.com/in/tv-show/shogun-2024',
 'https://www.justwatch.com/in/tv-show/heeramandi',
 'https://www.justwatch.com/in/tv-show/panchayat',
 'https://www.justwatch.com/in/tv-show/fallout',
 'https://www.justwatch.com/in/tv-show/mirzapur',
 'https://www.justwatch.com/in/tv-show/game-of-thrones',
 'https://www.justwatch.com/in/tv-show/3-body-problem',
 'https://www.justwatch.com/in/tv-show/dead-boy-detectives',
 'https://www.justwatch.com/in/tv-show/baby-reindeer',
 'https://www.justwatch.com/in/tv-show/young-sheldon',
 'https://www.justwatch.com/in/tv-show/apharan',
 'https://www.justwatch.com/in/tv-show/attack-on-titan',
 'https://www.justwatch.com/in/tv-show/murder-in-mahim',
 'https://www.justwatch.com/in/tv-show/sunflower-2021',
 'https://www.justwatch.com/in/tv-show/inspector-rishi',
 'https://www.justwatch.com/in/tv-show/farzi',
 'https://www.justwatch.com/in/tv-show/aashram',
 'https://www.justwatch.com/in/tv-show/lucifer',
 'https://www.justwatch.com/in/tv-show/a

## **Fetching Tv Show Title details**

In [None]:
tv_url='https://www.justwatch.com/in/tv-show/shogun-2024'
# Sending an HTTP GET request to the URL
page=requests.get(tv_url)
# Parsing the HTML content using BeautifulSoup with the 'html.parser'
soup=BeautifulSoup(page.text,'html.parser')

In [None]:
# Write Your Code here
soup.find_all('div',attrs={'data-testid':'titleBlock'})[0].find_all('h1')[0].text.strip()

'Shōgun'

## **Fetching Release Year**

In [None]:
# Write Your Code here
eval(soup.find_all('div',attrs={'data-testid':'titleBlock'})[0].find_all('span')[0].text.strip().split()[0])

2024

## **Fetching TV Show Genre Details**

In [None]:
soup.find_all('div',attrs={'class':'detail-infos__value'})[2].text

'War & Military, Drama, History'

## **Fetching IMDB Rating Details**

In [None]:
# Write Your Code here
eval(soup.find_all('div',attrs={'class':'detail-infos__value'})[1].find_all('span')[1].text.strip().split()[0])

8.8

## **Fetching Age Rating Details**

In [None]:
# Write Your Code here
tv_url='https://www.justwatch.com/in/tv-show/fallout'
# Sending an HTTP GET request to the URL
page=requests.get(tv_url)
# Parsing the HTML content using BeautifulSoup with the 'html.parser'
soup=BeautifulSoup(page.text,'html.parser')


In [None]:
soup.find_all('div',attrs={'class':'detail-infos'})[4].text

'Age ratingA'

## **Fetching Production Country details**

In [None]:
# Write Your Code here
soup.find_all('div',attrs={'class':'detail-infos__value'})[4].text

'United States'

## **Fetching Streaming Service details**

In [None]:
# Write Your Code here
soup.find_all('div',attrs={'class':'shadow-boxed container page__about-us__mission'})

[]

## **Fetching Duration Details**

In [None]:
# Write Your Code here
soup.find_all('div',attrs={'class':'detail-infos__value'})[3].text

'59min'

## **Creating TV Show DataFrame**

In [None]:
import time
time.sleep(2)

In [None]:
# Write Your Code here
# Write Your Code here
movie_full = []
for l in list_links:
  info_dict = {}
  # Specifying the URL from which movies related data will be fetched
  url = l
  # Sending an HTTP GET request to the URL
  page=requests.get(url)
  # Parsing the HTML content using BeautifulSoup with the 'html.parser'
  soup=BeautifulSoup(page.text,'html.parser')
  try:
    info_dict['link'] = l
    info_dict['release_year'] = eval(soup.find_all('div',attrs={'data-testid':'titleBlock'})[0].find_all('span')[0].text.strip().split()[0])
    # info_dict['imdb'] = eval(soup.find_all('div',attrs = {'class':'detail-infos'})[1].find_all('span')[0].text.strip().split(' ')[0])
    for i in soup.find_all('div',attrs = {'class':'detail-infos'}) :
      if i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text =='Genres':
        info_dict['Genres'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()
      elif i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text =='Runtime':
        info_dict['Runtime'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()
      elif i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text =='Age rating':
        info_dict['Age_rating'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()
      elif i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text.strip() =='Production country':
        info_dict['Production_country'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()

      elif i.find_all('h3', attrs = {'class':'detail-infos__subheading'})[0].text.strip() =='Rating':
        info_dict['Rating'] = i.find_all('div', attrs = {'class':'detail-infos__value'})[0].text.strip()

    movie_full.append(info_dict)
    print('Successful:',l)
  except:
    print('Error:',l)
    continue

  time.sleep(2)


Successful: https://www.justwatch.com/in/tv-show/shogun-2024
Successful: https://www.justwatch.com/in/tv-show/heeramandi
Successful: https://www.justwatch.com/in/tv-show/panchayat
Successful: https://www.justwatch.com/in/tv-show/fallout
Successful: https://www.justwatch.com/in/tv-show/mirzapur
Successful: https://www.justwatch.com/in/tv-show/game-of-thrones
Successful: https://www.justwatch.com/in/tv-show/3-body-problem
Successful: https://www.justwatch.com/in/tv-show/dead-boy-detectives
Successful: https://www.justwatch.com/in/tv-show/baby-reindeer
Successful: https://www.justwatch.com/in/tv-show/young-sheldon
Successful: https://www.justwatch.com/in/tv-show/apharan
Successful: https://www.justwatch.com/in/tv-show/attack-on-titan
Successful: https://www.justwatch.com/in/tv-show/murder-in-mahim
Successful: https://www.justwatch.com/in/tv-show/sunflower-2021
Successful: https://www.justwatch.com/in/tv-show/inspector-rishi
Error: https://www.justwatch.com/in/tv-show/farzi
Successful: htt

KeyboardInterrupt: 

In [None]:
tv_show_data=pd.DataFrame(movie_full)

In [None]:
tv_show_data

Unnamed: 0,link,release_year,Rating,Genres,Runtime,Production_country,Age_rating
0,https://www.justwatch.com/in/tv-show/shogun-2024,2024,8.8 (122k),"War & Military, Drama, History",59min,United States,
1,https://www.justwatch.com/in/tv-show/heeramandi,2024,6.5 (23k),"Drama, History, Romance",55min,India,
2,https://www.justwatch.com/in/tv-show/panchayat,2020,8.9 (85k),"Comedy, Drama",33min,India,
3,https://www.justwatch.com/in/tv-show/fallout,2024,8.5 (168k),"Action & Adventure, Drama, Science-Fiction, Wa...",59min,United States,A
4,https://www.justwatch.com/in/tv-show/mirzapur,2018,8.5 (82k),"Crime, Action & Adventure, Drama, Mystery & Th...",50min,India,
...,...,...,...,...,...,...,...
74,https://www.justwatch.com/in/tv-show/loki,2021,8.2 (412k),"Science-Fiction, Action & Adventure, Fantasy, ...",49min,United States,
75,https://www.justwatch.com/in/tv-show/them,2021,7.5 (26k),"Drama, Horror, Mystery & Thriller",41min,United States,
76,https://www.justwatch.com/in/tv-show/x-men-97,2024,9.0 (20k),"Animation, Action & Adventure, Science-Fiction...",33min,United States,
77,https://www.justwatch.com/in/tv-show/college-r...,2018,8.3 (28k),"Comedy, Drama, Romance",31min,India,A


## **Task 2 :- Data Filtering & Analysis**

In [None]:
# Write Your Code here

tv_show_data.head()


Unnamed: 0,link,release_year,Rating,Genres,Runtime,Production_country,Age_rating
0,https://www.justwatch.com/in/tv-show/shogun-2024,2024,8.8 (122k),"War & Military, Drama, History",59min,United States,
1,https://www.justwatch.com/in/tv-show/heeramandi,2024,6.5 (23k),"Drama, History, Romance",55min,India,
2,https://www.justwatch.com/in/tv-show/panchayat,2020,8.9 (85k),"Comedy, Drama",33min,India,
3,https://www.justwatch.com/in/tv-show/fallout,2024,8.5 (168k),"Action & Adventure, Drama, Science-Fiction, Wa...",59min,United States,A
4,https://www.justwatch.com/in/tv-show/mirzapur,2018,8.5 (82k),"Crime, Action & Adventure, Drama, Mystery & Th...",50min,India,


In [None]:
tv_show_data.shape

(79, 7)

In [None]:
tv_show_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 79 entries, 0 to 78
Data columns (total 7 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   link                79 non-null     object
 1   release_year        79 non-null     int64 
 2   Rating              79 non-null     object
 3   Genres              79 non-null     object
 4   Runtime             79 non-null     object
 5   Production_country  79 non-null     object
 6   Age_rating          31 non-null     object
dtypes: int64(1), object(6)
memory usage: 4.4+ KB


In [None]:
tv_show_data.describe()

Unnamed: 0,release_year
count,79.0
mean,2018.974684
std,5.663592
min,2002.0
25%,2017.0
50%,2020.0
75%,2024.0
max,2024.0


## **Calculating Mean IMDB Ratings for both Movies and Tv Shows**

Mean IMDB Ratings for tv Shows

In [None]:
tv_show_data['Rating'] = tv_show_data['Rating'].str.replace(r"\(\d+(\.\d+)?[mk]\)", "", regex=True).str.strip()

# Replace empty strings with NaN
tv_show_data['Rating'] = tv_show_data['Rating'].replace('', np.nan)

# Convert the 'ratings' column to float, coercing errors to NaN
tv_show_data['Rating']= pd.to_numeric(tv_show_data['Rating'], errors='coerce')

# Calculate the mean of the 'ratings' column, ignoring NaN values
mean_rating = tv_show_data['Rating'].mean()
print("Mean Rating:", mean_rating)




Mean Rating: 7.827272727272728


Mean imdb rating for movies


In [None]:
movies_data['Rating'] = movies_data['Rating'].str.replace(r"\(\d+(\.\d+)?[mk]\)", "", regex=True).str.strip()

# Replace empty strings with NaN
movies_data['Rating'] = movies_data['Rating'].replace('', np.nan)

# Convert the 'ratings' column to float, coercing errors to NaN
movies_data['Rating']= pd.to_numeric(movies_data['Rating'], errors='coerce')

# Calculate the mean of the 'ratings' column, ignoring NaN values
mean_rating = movies_data['Rating'].mean()
print("Mean Rating:", mean_rating)

Mean Rating: 7.022340425531915


## **Analyzing Top Genres**

In [None]:
# Write Your Code here
top_geners=movies_data['Genres'].head(5)

In [None]:
top_geners

0                                    Comedy, Drama
1    Mystery & Thriller, Action & Adventure, Drama
2       Drama, Action & Adventure, Comedy, Romance
3                       Action & Adventure, Comedy
4                Mystery & Thriller, Horror, Drama
Name: Genres, dtype: object

## **Task 3 :- Data Export**

In [None]:

file_path = 'tv_show_data.csv'

# Save the dataframe to a CSV file
tv_show_data.to_csv(file_path, index=False)

print("Final dataframe saved as 'Final_Data.csv' successfully!")


Final dataframe saved as 'Final_Data.csv' successfully!


In [None]:
#saving filter data as Filter Data in csv format


# ***Congratulations!!! You have completed your Assignment.***