<a href="https://colab.research.google.com/github/neha2gupta7/Web-Scrapping/blob/main/Numerical_Programming_in_Python_Web_Scraping_Neha_Gupta.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Web Scraping & Data Handling Challenge**



### **Website:**
JustWatch -  https://www.justwatch.com/in/movies?release_year_from=2000


### **Description:**

JustWatch is a popular platform that allows users to search for movies and TV shows across multiple streaming services like Netflix, Amazon Prime, Hulu, etc. For this assignment, you will be required to scrape movie and TV show data from JustWatch using Selenium, Python, and BeautifulSoup. Extract data from HTML, not by directly calling their APIs. Then, perform data filtering and analysis using Pandas, and finally, save the results to a CSV file.

### **Tasks:**

**1. Web Scraping:**

Use BeautifulSoup to scrape the following data from JustWatch:

   **a. Movie Information:**

      - Movie title
      - Release year
      - Genre
      - IMDb rating
      - Streaming services available (Netflix, Amazon Prime, Hulu, etc.)
      - URL to the movie page on JustWatch

   **b. TV Show Information:**

      - TV show title
      - Release year
      - Genre
      - IMDb rating
      - Streaming services available (Netflix, Amazon Prime, Hulu, etc.)
      - URL to the TV show page on JustWatch

  **c. Scope:**

```
 ` - Scrape data for at least 50 movies and 50 TV shows.
   - You can choose the entry point (e.g., starting with popular movies,
     or a specific genre, etc.) to ensure a diverse dataset.`

```


**2. Data Filtering & Analysis:**

   After scraping the data, use Pandas to perform the following tasks:

   **a. Filter movies and TV shows based on specific criteria:**

   ```
      - Only include movies and TV shows released in the last 2 years (from the current date).
      - Only include movies and TV shows with an IMDb rating of 7 or higher.
```

   **b. Data Analysis:**

   ```
      - Calculate the average IMDb rating for the scraped movies and TV shows.
      - Identify the top 5 genres that have the highest number of available movies and TV shows.
      - Determine the streaming service with the most significant number of offerings.
      
   ```   

**3. Data Export:**

```
   - Dump the filtered and analysed data into a CSV file for further processing and reporting.

   - Keep the CSV file in your Drive Folder and Share the Drive link on the colab while keeping view access with anyone.
```

**Submission:**
```
- Submit a link to your Colab made for the assignment.

- The Colab should contain your Python script (.py format only) with clear
  comments explaining the scraping, filtering, and analysis process.

- Your Code shouldn't have any errors and should be executable at a one go.

- Before Conclusion, Keep your Dataset Drive Link in the Notebook.
```



**Note:**

1. Properly handle errors and exceptions during web scraping to ensure a robust script.

2. Make sure your code is well-structured, easy to understand, and follows Python best practices.

3. The assignment will be evaluated based on the correctness of the scraped data, accuracy of data filtering and analysis, and the overall quality of the Python code.








# **Start The Project**

## **Task 1:- Web Scrapping**

In [172]:
#Installing all necessary labraries
!pip install bs4
!pip install requests



In [173]:
#import all necessary labraries
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import numpy as np

## **Scrapping Movies Data**

In [174]:
# Specifying the URL from which movies related data will be fetched

# WAS GETTING 403 ERROR, so took help online and found this piece of code so that it wont throw 403 error
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
}
url='https://www.justwatch.com/in/movies?release_year_from=2000'

# Sending an HTTP GET request to the URL
response=requests.get(url, headers= headers)
# Parsing the HTML content using BeautifulSoup with the 'html.parser'
soup=BeautifulSoup(response.text,'html.parser')
# Printing the prettified HTML content
print(soup.prettify())

<!DOCTYPE html>
<html data-vue-meta="%7B%22dir%22:%7B%22ssr%22:%22ltr%22%7D,%22lang%22:%7B%22ssr%22:%22en%22%7D%7D" data-vue-meta-server-rendered="" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta charset="utf-8" data-vue-meta="ssr"/>
  <meta content="IE=edge" data-vue-meta="ssr" httpequiv="X-UA-Compatible"/>
  <meta content="viewport-fit=cover, width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no" data-vue-meta="ssr" name="viewport"/>
  <meta content="JustWatch" data-vue-meta="ssr" property="og:site_name"/>
  <meta content="794243977319785" data-vue-meta="ssr" property="fb:app_id"/>
  <meta content="/appassets/img/JustWatch_logo_with_claim.png" data-vmid="og:image" data-vue-meta="ssr" property="og:image"/>
  <meta content="606" data-vmid="og:image:width" data-vue-meta="ssr" property="og:image:width"/>
  <meta content="302" data-vmid="og:image:height" data-vue-meta="ssr" pro

## **Fetching Movie URL's**

In [175]:
# Write Your Code here
url_list=[]
for x in soup.find_all('a', class_="title-list-grid__item--link"):
  url_list.append('https://www.justwatch.com'+x['href'])

print(len(url_list))
print(url_list)

100
['https://www.justwatch.com/in/movie/kill-2024', 'https://www.justwatch.com/in/movie/munjha', 'https://www.justwatch.com/in/movie/maharaja-2024', 'https://www.justwatch.com/in/movie/project-k', 'https://www.justwatch.com/in/movie/deadpool-3', 'https://www.justwatch.com/in/movie/stree-2', 'https://www.justwatch.com/in/movie/stree', 'https://www.justwatch.com/in/movie/chandu-champion', 'https://www.justwatch.com/in/movie/kingdom-of-the-planet-of-the-apes', 'https://www.justwatch.com/in/movie/aadujeevitham', 'https://www.justwatch.com/in/movie/deadpool', 'https://www.justwatch.com/in/movie/agent', 'https://www.justwatch.com/in/movie/dune-part-two', 'https://www.justwatch.com/in/movie/the-gangster-the-cop-the-devil', 'https://www.justwatch.com/in/movie/bad-boys-4', 'https://www.justwatch.com/in/movie/the-ministry-of-ungentlemanly-warfare', 'https://www.justwatch.com/in/movie/aavesham-2024', 'https://www.justwatch.com/in/movie/indian-2', 'https://www.justwatch.com/in/movie/phir-aayi-has

## **Scrapping Movie Title**

In [176]:
# Write Your Code here
import time
movie_title = []

for url in url_list:
  try:
    response = requests.get(url, headers = headers)
    soup = BeautifulSoup(response.text,'html.parser')
    title=soup.find_all('h1')[0].text
  except:
    title='NA'

  movie_title.append(title)
  time.sleep(1)

print(movie_title)
print(len(movie_title))

[' Kill (2024)', ' Munjya (2024)', ' Maharaja (2024)', ' Kalki 2898-AD (2024)', ' Deadpool & Wolverine (2024)', ' Woman 2: Terror of the Headless (2024)', ' Stree (2018)', ' Chandu Champion (2024)', ' Kingdom of the Planet of the Apes (2024)', ' The Goat Life (2024)', ' Deadpool (2016)', ' Agent (2023)', ' Dune: Part Two (2024)', ' The Gangster, the Cop, the Devil (2019)', ' Bad Boys: Ride or Die (2024)', ' The Ministry of Ungentlemanly Warfare (2024)', ' Aavesham (2024)', ' Indian 2 (2024)', ' Phir Aayi Hasseen Dillruba (2024)', ' The Fall Guy (2024)', ' Laila Majnu (2018)', ' Bhaiyya Ji (2024)', ' 365 Days (2020)', ' Ullozhukku (2024)', ' Furiosa: A Mad Max Saga (2024)', ' Weapon (2024)', ' Harom Hara (2024)', ' Je Jatt Vigad Gya (2024)', ' Maharshi (2019)', ' A Quiet Place: Day One (2024)', ' Mr. & Mrs. Mahi (2024)', ' Salaar (2023)', ' Deadpool 2 (2018)', ' The Family Star (2024)', ' Golam (2024)', ' Love Lies Bleeding (2024)', ' Savi (2024)', ' Dune (2021)', ' Laapataa Ladies (202

## **Scrapping release Year**

In [177]:
# Write Your Code here
import time
movie_year=[]

for url in url_list:
  try:
    response=requests.get(url,headers=headers)
    soup=BeautifulSoup(response.text,'html.parser')
    year=soup.find_all('span',class_='release-year')[0].text.strip('()')
  except:
    year='NA'

  movie_year.append(year)
  time.sleep(1)

print(movie_year)
print(len(movie_year))

['2024', '2024', '2024', '2024', '2024', '2024', '2018', '2024', '2024', '2024', '2016', '2023', '2024', '2019', '2024', '2024', '2024', '2024', '2024', '2024', '2018', '2024', '2020', '2024', '2024', '2024', '2024', '2024', '2019', '2024', '2024', '2023', '2018', '2024', '2024', '2024', '2024', '2021', '2024', '2024', '2024', '2024', '2024', '2023', '2023', '2024', '2024', '2024', '2024', '2024', '2024', '2009', '2024', '2023', '2015', '2022', '2024', '2023', '2023', '2024', '2016', '2024', '2023', '2024', '2021', '2024', '2024', '2024', '2011', '2024', '2022', '2001', '2024', '2024', '2024', '2024', '2016', '2024', '2018', '2024', '2002', '2024', '2024', '2017', '2024', '2018', '2024', '2004', '2019', '2024', '2023', '2013', '2015', '2022', '2018', '2022', '2003', '2014', '2024', '2013']
100


## **Scrapping Genres**

In [178]:
# Write Your Code here
import time
movie_genre = []

for url in url_list:
  try:
    response = requests.get(url, headers=headers)
    soup=BeautifulSoup(response.text,'html.parser')
    for x in soup.find_all('div',class_='detail-infos'):
       if x.find_all('h3')[0].text=='Genres':
        genre=x.find_all('span')[0].text
  except:
    genre='NA'

  movie_genre.append(genre)
  time.sleep(1)

print(len(movie_genre))
print(movie_genre)


100
['Drama, Mystery & Thriller, Action & Adventure, Crime', 'Horror, Comedy', 'Mystery & Thriller, Action & Adventure, Crime, Drama', 'Drama, Fantasy, Science-Fiction, Mystery & Thriller, Action & Adventure', 'Action & Adventure, Comedy, Science-Fiction', 'Comedy, Horror', 'Horror, Comedy, Drama', 'Drama, History, Sport, War & Military, Action & Adventure', 'Science-Fiction, Action & Adventure, Drama, Mystery & Thriller', 'Drama', 'Comedy, Action & Adventure', 'Mystery & Thriller, Action & Adventure', 'Science-Fiction, Action & Adventure, Drama', 'Crime, Action & Adventure, Mystery & Thriller', 'Comedy, Action & Adventure, Crime, Mystery & Thriller', 'Action & Adventure, Comedy, War & Military', 'Action & Adventure, Comedy', 'Action & Adventure, Drama, Mystery & Thriller', 'Mystery & Thriller, Romance, Crime, Drama', 'Drama, Romance, Action & Adventure, Comedy', 'Drama, Romance', 'Drama, Action & Adventure', 'Drama, Romance, Made in Europe', 'Drama', 'Action & Adventure, Science-Ficti

## **Scrapping IMBD Rating**

In [179]:
# Write Your Code here
import time
movie_imdbrating=[]

for url in url_list:
  try:
     response=requests.get(url,headers=headers)
     soup=BeautifulSoup(response.text,'html.parser')
     for x in soup.find_all('div',class_='title-detail-hero-details__item'):
      imdbratings = soup.find_all('span', class_='imdb-score')
      if imdbratings:
        imdbrating = imdbratings[0].text.strip()   # If we want only rating of IMDb then we can use code imdbrating = imdbratings[0].text.strip().split()[0]
  except:
    imdbrating='NA'

  movie_imdbrating.append(imdbrating)
  time.sleep(1)

print(len(movie_imdbrating))
print(movie_imdbrating)

100
['7.7 (19k)', '7.2 (15k)', '8.6 (39k)', '7.6 (43k)', '8.0 (233k)', '7.9 (15k)', '7.5 (40k)', '8.0 (29k)', '7.0 (107k)', '8.0 (15k)', '8.0 (1m)', '4.2 (1.5k)', '8.5 (499k)', '6.9 (25k)', '6.7 (58k)', '6.8 (87k)', '7.9 (18k)', '4.2 (14k)', '5.8 (4.2k)', '6.9 (129k)', '7.7 (5.7k)', '5.1 (6.8k)', '3.3 (100k)', '7.6 (965)', '7.6 (167k)', '6.9 (5.4k)', '8.0 (2.8k)', '4.7 (500)', '7.2 (10k)', '6.4 (77k)', '6.0 (15k)', '6.5 (68k)', '7.6 (675k)', '5.3 (4.5k)', '7.3 (2.4k)', '6.7 (39k)', '6.6 (33k)', '8.0 (897k)', '8.4 (39k)', '6.5 (12k)', '6.8 (73k)', '8.2 (3.1k)', '6.3 (52k)', '7.9 (55k)', '8.3 (785k)', '8.3 (21k)', '7.8 (103k)', '6.2 (32k)', '6.9 (3.4k)', '7.4 (2k)', '6.1 (98k)', '4.3 (9.6k)', '6.9 (15k)', '7.7 (18k)', '8.1 (821k)', '6.7 (26k)', '7.2 (2.3k)', '8.9 (126k)', '6.2 (96k)', '6.1 (69k)', '8.3 (215k)', '6.9 (80k)', '6.7 (138k)', '7.8 (14k)', '6.9 (32k)', '5.1 (1.8k)', '7.1 (50k)', '7.1 (50k)', '7.8 (50k)', '5.5 (14k)', '7.0 (158k)', '7.6 (867k)', '7.8', '7.5 (11k)', '6.0 (22k)',

## **Scrapping Runtime/Duration**

In [180]:
# Write Your Code here
import time
movie_runtime = []

for url in url_list:
  try:
    response = requests.get(url, headers=headers)
    soup=BeautifulSoup(response.text,'html.parser')
    for x in soup.find_all('div',class_='detail-infos'):
       if x.find_all('h3')[0].text=='Runtime':
        runtime=x.find_all('div')[0].text
  except:
    runtime='NA'

  movie_runtime.append(runtime)
  time.sleep(1)

print(len(movie_runtime))
print(movie_runtime)


100
['1h 45min', '2h 3min', '2h 30min', '3h 1min', '2h 8min', '2h 27min', '2h 8min', '2h 22min', '2h 25min', '2h 0min', '1h 48min', '2h 34min', '2h 47min', '1h 50min', '1h 55min', '2h 2min', '2h 38min', '3h 0min', '2h 13min', '2h 6min', '2h 19min', '2h 30min', '1h 54min', '2h 3min', '2h 28min', '2h 0min', '2h 34min', '2h 12min', '2h 56min', '1h 39min', '2h 19min', '2h 55min', '1h 59min', '2h 39min', '2h 0min', '1h 44min', '2h 3min', '2h 35min', '2h 2min', '2h 12min', '2h 3min', '2h 19min', '1h 34min', '2h 4min', '3h 0min', '2h 15min', '1h 37min', '1h 34min', '2h 25min', '2h 28min', '1h 55min', '2h 34min', '2h 10min', '2h 30min', '1h 35min', '2h 36min', '2h 13min', '2h 26min', '3h 24min', '1h 55min', '2h 41min', '1h 55min', '2h 37min', '2h 36min', '2h 15min', '2h 28min', '1h 41min', '2h 30min', '2h 39min', '1h 49min', '1h 47min', '2h 32min', '2h 39min', '2h 14min', '2h 20min', '1h 49min', '2h 50min', '2h 12min', '1h 53min', '2h 16min', '1h 33min', '2h 30min', '3h 1min', '2h 17min', '2h 

## **Scrapping Age Rating**

In [181]:
# Write Your Code here
import time
movie_agerating=[]

for url in url_list:
  try:
     response=requests.get(url,headers=headers)
     soup=BeautifulSoup(response.text,'html.parser')
     for x in soup.find_all('div',class_='detail-infos'):
       if x.find_all('h3')[0].text=='Age rating':
        age_rating=x.find_all('div')[0].text
  except:
    runtime='NA'

  movie_agerating.append(age_rating)
  time.sleep(1)

print(len(movie_agerating))
print(movie_agerating)

100
['A', 'A', 'A', 'UA', 'A', 'UA', 'UA', 'UA', 'UA', 'UA', 'A', 'UA', 'UA', 'UA', 'UA', 'UA', 'UA', 'UA', 'UA', 'UA', 'UA', 'UA', 'UA', 'U', 'A', 'UA', 'A', 'A', 'UA', 'UA', 'U', 'A', 'A', 'A', 'UA', 'UA', 'UA', 'UA', 'UA', 'UA', 'UA', 'UA', 'U', 'U', 'UA', 'UA', 'U', 'U', 'A', 'A', 'UA', 'U', 'A', 'UA', 'U', 'UA', 'UA', 'UA', 'A', 'A', 'U', 'A', 'A', 'U', 'U', 'U', 'A', 'UA', 'UA', 'UA', 'A', 'U', 'UA', 'U', 'UA', 'UA', 'UA', 'UA', 'A', 'UA', 'U', 'UA', 'UA', 'A', 'UA', 'UA', 'UA', 'A', 'A', 'A', 'UA', 'A', 'A', 'UA', 'A', 'UA', 'UA', 'A', 'A', 'U']


## **Fetching Production Countries Details**

In [182]:
# Write Your Code here
import time
movie_productioncountries=[]

for url in url_list:
  try:
     content=requests.get(url,headers=headers)
     soup=BeautifulSoup(content.text,'html.parser')
     for x in soup.find_all('div', class_='detail-infos'):
      if x.find_all('h3')[0].text == ' Production country ':
        production_country = x.find_all('div')[0].text
  except:
    production_country='NA'

  movie_productioncountries.append(production_country)
  time.sleep(1)

print(len(movie_productioncountries))
print(movie_productioncountries)

100
['United States, India', 'India', 'India', 'India', 'United States', 'India', 'India', 'India', 'United States', 'United States, India', 'United States', 'India', 'United States', 'South Korea', 'United States', 'United States, United Kingdom, Turkey', 'India', 'India', 'India', 'United States, Australia, Canada', 'India', 'India', 'Poland', 'India', 'Australia, United States', 'India', 'India', 'India', 'India', 'United States, United Kingdom, Canada', 'India', 'India', 'United States', 'India', 'India', 'United Kingdom, United States', 'India', 'United States', 'India', 'India', 'United States', 'India', 'United States', 'Japan, Germany', 'United States, United Kingdom', 'India', 'United States', 'United States', 'India', 'India', 'United States', 'India, Thailand, China', 'United States', 'India', 'United States', 'India', 'India', 'India', 'India', 'United States', 'India, United States', 'United States', 'Canada, United States', 'India', 'India', 'India', 'Canada, United State

## **Fetching Streaming Service Details**

In [183]:
# Write Your Code here
import time
movie_streaming_services=[]

for url in url_list:
  try:
     response=requests.get(url,headers=headers)
     soup=BeautifulSoup(response.text,'html.parser')
     names=(x['alt'] for x in soup.find_all('img', class_="offer__icon"))
  except:
    names='NA'

  movie_streaming_services.append(" , ".join(names))
  time.sleep(1)

print(len(movie_streaming_services))
print(movie_streaming_services)

100
['Apple TV+', 'Apple TV+', 'Netflix , Bookmyshow , Apple TV+', 'Amazon Prime Video , Amazon Prime Video , Netflix , Apple TV+ , Amazon Video , Bookmyshow', 'Bookmyshow , Apple TV+', 'Bookmyshow , Apple TV+', 'Apple TV , Hotstar , Apple TV , Apple TV+ , Apple TV', 'Amazon Prime Video , Amazon Prime Video , Bookmyshow , Apple TV+', 'Apple TV , Hotstar , Apple TV , Apple TV+ , Amazon Video , Apple TV', 'Netflix , Apple TV+', 'Apple TV , Hotstar , Amazon Video , Apple TV+ , Apple TV , Apple TV', 'Apple TV+', 'Apple TV , Jio Cinema , Amazon Video , Apple TV+ , Apple TV , Apple TV', 'Apple TV+', 'Apple TV , Zee5 , Amazon Video , Apple TV+ , Apple TV , Apple TV', 'Amazon Prime Video , Amazon Prime Video , Apple TV+', 'Amazon Prime Video , Amazon Prime Video , Hotstar , Apple TV+ , Amazon Video', 'Netflix , Apple TV+', 'Netflix , Apple TV+', 'Apple TV , Zee5 , Amazon Video , Apple TV+ , Apple TV , Apple TV', 'Zee5 , Apple TV+', 'Zee5 , Apple TV+', 'Netflix , Apple TV+', 'Amazon Prime Video

## **Now Creating Movies DataFrame**

In [208]:
# Write Your Code here
info_dict = {
    'movie_link': url_list,
    'movie_title': movie_title,
    'movie_year': movie_year,
    'movie_genre': movie_genre,
    'imdb_rating': movie_imdbrating,
    'movie_runtime': movie_runtime,
    'movie_agerating': movie_agerating,
    'production_country': movie_productioncountries,
    'streaming_service': movie_streaming_services
}

movie_data=pd.DataFrame(info_dict)

In [209]:
movie_data

Unnamed: 0,movie_link,movie_title,movie_year,movie_genre,imdb_rating,movie_runtime,movie_agerating,production_country,streaming_service
0,https://www.justwatch.com/in/movie/kill-2024,Kill (2024),2024,"Drama, Mystery & Thriller, Action & Adventure,...",7.7 (19k),1h 45min,A,"United States, India",Apple TV+
1,https://www.justwatch.com/in/movie/munjha,Munjya (2024),2024,"Horror, Comedy",7.2 (15k),2h 3min,A,India,Apple TV+
2,https://www.justwatch.com/in/movie/maharaja-2024,Maharaja (2024),2024,"Mystery & Thriller, Action & Adventure, Crime,...",8.6 (39k),2h 30min,A,India,"Netflix , Bookmyshow , Apple TV+"
3,https://www.justwatch.com/in/movie/project-k,Kalki 2898-AD (2024),2024,"Drama, Fantasy, Science-Fiction, Mystery & Thr...",7.6 (43k),3h 1min,UA,India,"Amazon Prime Video , Amazon Prime Video , Netf..."
4,https://www.justwatch.com/in/movie/deadpool-3,Deadpool & Wolverine (2024),2024,"Action & Adventure, Comedy, Science-Fiction",8.0 (233k),2h 8min,A,United States,"Bookmyshow , Apple TV+"
...,...,...,...,...,...,...,...,...,...
95,https://www.justwatch.com/in/movie/777-charlie,777 Charlie (2022),2022,"Comedy, Drama, Action & Adventure",8.7 (41k),2h 46min,UA,India,"Amazon Prime Video , Apple TV , Amazon Prime V..."
96,https://www.justwatch.com/in/movie/memories-of...,Memories of Murder (2003),2003,"Crime, Drama, Mystery & Thriller",8.1 (222k),2h 11min,UA,South Korea,Apple TV+
97,https://www.justwatch.com/in/movie/kingsman-th...,Kingsman: The Secret Service (2014),2014,"Action & Adventure, Crime, Comedy, Mystery & T...",7.7 (726k),2h 9min,A,"United Kingdom, United States","Apple TV , Hotstar , Netflix , Apple TV+ , Ama..."
98,https://www.justwatch.com/in/movie/guruvayoor-...,Guruvayoor Ambalanadayil (2024),2024,Comedy,6.6 (2.8k),2h 12min,A,India,"Hotstar , Bookmyshow , Apple TV+"


In [186]:
# make a csv file
movie_data.to_csv('movie_data.csv')

## **Scraping TV  Show Data**

In [187]:
# Specifying the URL from which tv show related data will be fetched
tv_url='https://www.justwatch.com/in/tv-shows?release_year_from=2000'

# WAS GETTING 403 ERROR, so took help online and found this piece of code so that it wont throw 403 error
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
}

# Sending an HTTP GET request to the URL
response=requests.get(tv_url, headers=headers)

# Parsing the HTML content using BeautifulSoup with the 'html.parser'
soup=BeautifulSoup(response.text,'html.parser')

# Printing the prettified HTML content
print(soup.prettify())

<!DOCTYPE html>
<html data-vue-meta="%7B%22dir%22:%7B%22ssr%22:%22ltr%22%7D,%22lang%22:%7B%22ssr%22:%22en%22%7D%7D" data-vue-meta-server-rendered="" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta charset="utf-8" data-vue-meta="ssr"/>
  <meta content="IE=edge" data-vue-meta="ssr" httpequiv="X-UA-Compatible"/>
  <meta content="viewport-fit=cover, width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no" data-vue-meta="ssr" name="viewport"/>
  <meta content="JustWatch" data-vue-meta="ssr" property="og:site_name"/>
  <meta content="794243977319785" data-vue-meta="ssr" property="fb:app_id"/>
  <meta content="/appassets/img/JustWatch_logo_with_claim.png" data-vmid="og:image" data-vue-meta="ssr" property="og:image"/>
  <meta content="606" data-vmid="og:image:width" data-vue-meta="ssr" property="og:image:width"/>
  <meta content="302" data-vmid="og:image:height" data-vue-meta="ssr" pro

## **Fetching Tv shows Url details**

In [188]:
# Write Your Code here

tvshow_url=[]

for x in soup.find_all('a',class_="title-list-grid__item--link"):
  tvshow_url.append('https://www.justwatch.com'+x['href'])

print(len(tvshow_url))
print(tvshow_url)

100
['https://www.justwatch.com/in/tv-show/mirzapur', 'https://www.justwatch.com/in/tv-show/house-of-the-dragon', 'https://www.justwatch.com/in/tv-show/adams-sweet-agony', 'https://www.justwatch.com/in/tv-show/the-boys', 'https://www.justwatch.com/in/tv-show/gyaarah-gyaarah', 'https://www.justwatch.com/in/tv-show/sweet-home', 'https://www.justwatch.com/in/tv-show/game-of-thrones', 'https://www.justwatch.com/in/tv-show/panchayat', 'https://www.justwatch.com/in/tv-show/apharan', 'https://www.justwatch.com/in/tv-show/x-x-x-uncensored', 'https://www.justwatch.com/in/tv-show/attack-on-titan', 'https://www.justwatch.com/in/tv-show/shogun-2024', 'https://www.justwatch.com/in/tv-show/shekhar-home', 'https://www.justwatch.com/in/tv-show/batman-caped-crusader', 'https://www.justwatch.com/in/tv-show/tribhuvan-mishra-ca-topper', 'https://www.justwatch.com/in/tv-show/demon-slayer-kimetsu-no-yaiba', 'https://www.justwatch.com/in/tv-show/elite', 'https://www.justwatch.com/in/tv-show/presumed-innocent

## **Fetching Tv Show Title details**

In [217]:
# Write Your Code here
import time
tvshow_title = []

for url in tvshow_url:
  try:
    response = requests.get(url, headers = headers)
    soup = BeautifulSoup(response.text,'html.parser')
    title=soup.find('h1',class_='title-detail-hero__details__title').text.strip()
  except:
    title='NA'

  tvshow_title.append(title)
  time.sleep(1)

print(tvshow_title)
print(len(tvshow_title))

['Mirzapur (2018)', 'House of the Dragon (2022)', "Adam's Sweet Agony (2024)", 'The Boys (2019)', 'Gyaarah Gyaarah (2024)', 'Sweet Home (2020)', 'Game of Thrones (2011)', 'Panchayat (2020)', 'Apharan (2018)', 'XXX: Uncensored (2018)', 'Attack on Titan (2013)', 'Shōgun (2024)', 'Shekhar Home (2024)', 'Batman: Caped Crusader (2024)', 'Tribhuvan Mishra CA Topper (2024)', 'Demon Slayer: Kimetsu no Yaiba (2019)', 'Elite (2018)', 'Presumed Innocent (2024)', 'The Umbrella Academy (2019)', 'Y: The Last Man (2021)', 'Shahmaran (2023)', 'Mastram (2020)', 'Mad Men (2007)', 'The Bear (2022)', 'Money Heist (2017)', 'Bigg Boss OTT (2021)', 'Bigg Boss (2006)', 'Asur: Welcome to Your Dark Side (2020)', 'Aashram (2020)', 'Gullak (2019)', "A Good Girl's Guide to Murder (2024)", 'Farzi (2023)', 'Unsolved Mysteries (2020)', 'Breaking Bad (2008)', 'Stranger Things (2016)', 'The Family Man (2019)', 'The Rookie (2018)', 'College Romance (2018)', 'Evil (2019)', 'Scam 1992: The Harshad Mehta Story (2020)', 'Da

## **Fetching Release Year**

In [213]:
# Write Your Code here
import time
tvshow_year=[]

for url in tvshow_url:
  try:
    response=requests.get(url,headers=headers)
    soup=BeautifulSoup(response.text,'html.parser')
    year=soup.find_all('span',class_='release-year')[0].text.strip('()')
  except:
    year='NA'

  tvshow_year.append(year)
  time.sleep(1)

print(tvshow_year)
print(len(tvshow_year))

['2018', '2022', '2024', '2019', '2024', '2020', '2011', '2020', '2018', '2018', '2013', '2024', '2024', '2024', '2024', '2019', '2018', '2024', '2019', '2021', '2023', '2020', '2007', '2022', '2017', '2021', '2006', '2020', '2020', '2019', '2024', '2023', '2020', '2008', '2016', '2019', '2018', '2018', '2019', '2020', '2017', '2024', '2024', '2019', '2018', '2022', '2018', '2007', '2020', '2014', '2022', '2014', '2018', '2010', '2024', '2020', '2013', '2008', '2002', '2024', '2024', '2024', '2017', '2005', '2022', '2010', '2021', '2023', '2019', '2022', '2020', '2017', '2024', '2024', '2010', '2009', '2011', '2005', '2023', '2021', '2013', '2022', '2018', '2004', '2004', '2019', '2024', '2016', '2021', '2024', '2024', '2021', '2019', '2023', '2020', '2009', '2019', '2024', '2020', '2016']
100


## **Fetching TV Show Genre Details**

In [191]:
# Write Your Code here
import time
tvshow_genre = []

for url in tvshow_url:
  try:
    response = requests.get(url, headers=headers)
    soup=BeautifulSoup(response.text,'html.parser')
    for x in soup.find_all('div',class_='detail-infos'):
       if x.find_all('h3')[0].text=='Genres':
        genre=x.find_all('span')[0].text
  except:
    genre='NA'

  tvshow_genre.append(genre)
  time.sleep(1)

print(len(tvshow_genre))
print(tvshow_genre)

100
['Action & Adventure, Drama, Crime, Mystery & Thriller', 'Action & Adventure, Science-Fiction, Drama, Fantasy, Romance', 'Animation', 'Science-Fiction, Action & Adventure, Drama, Comedy, Crime', 'Fantasy, Drama, Science-Fiction', 'Science-Fiction, Horror, Mystery & Thriller, Drama, Fantasy', 'Action & Adventure, Science-Fiction, Drama, Fantasy', 'Comedy, Drama', 'Drama, Action & Adventure, Crime, Mystery & Thriller', 'Comedy, Drama, Romance', 'Animation, Action & Adventure, Drama, Fantasy, Horror, Science-Fiction', 'War & Military, Drama, History', 'Drama, Crime', 'Action & Adventure, Kids & Family, Fantasy, Science-Fiction, Animation, Crime', 'Mystery & Thriller, Comedy, Crime, Drama', 'Animation, Action & Adventure, Science-Fiction, Fantasy, Mystery & Thriller', 'Drama, Crime, Mystery & Thriller', 'Crime, Drama', 'Fantasy, Science-Fiction, Action & Adventure, Comedy, Drama', 'Fantasy, Science-Fiction, Action & Adventure, Comedy, Drama', 'Mystery & Thriller, Action & Adventure, Dr

## **Fetching IMDB Rating Details**

In [222]:
# Write Your Code here
import time
tvshow_rating = []

for url in tvshow_url:
  try:
    response = requests.get(url, headers = headers)
    soup = BeautifulSoup(response.text,'html.parser')
    for x in soup.find_all('div',class_='title-detail-hero-details__item'):
      imdbratings = soup.find_all('span', class_='imdb-score')
      if imdbratings:
        imdbrating = imdbratings[0].text.strip()   # If we want only rating of IMDb then we can use code imdbrating = imdbratings[0].text.strip().split()[0]
      else:
        imdbrating='NA'
  except:
    imdbrating='NA'

  tvshow_rating.append(imdbrating)
  time.sleep(1)

print(len(tvshow_rating))
print(tvshow_rating)

100
['8.4 (87k)', '8.4 (437k)', 'NA', '8.7 (712k)', '8.3 (2.5k)', '7.3 (34k)', '9.2 (2m)', '9.0 (96k)', '8.2 (20k)', '4.8 (798)', '9.1 (536k)', '8.7 (164k)', '8.4 (7.6k)', '7.3 (9.6k)', '7.0 (2.2k)', '8.6 (163k)', '7.2 (93k)', '7.7 (41k)', '7.9 (286k)', '6.0 (16k)', '5.3 (12k)', '6.9 (2.5k)', '8.7 (264k)', '8.6 (239k)', '8.2 (543k)', '1.9 (1.2k)', '3.6 (4.1k)', '8.5 (67k)', '6.6 (58k)', '9.1 (24k)', '6.8 (10k)', '8.4 (47k)', '7.3 (13k)', '9.5 (2m)', '8.7 (1m)', '8.7 (100k)', '8.0 (80k)', '8.3 (29k)', '7.8 (40k)', '9.2 (159k)', '8.7 (457k)', '7.2 (3.9k)', '8.4 (244k)', '9.0 (85k)', '3.4 (3k)', '8.0 (75k)', '7.7 (307k)', '8.7 (177k)', '8.6 (128k)', '7.5 (371k)', '7.7 (17k)', '8.9 (668k)', '8.5 (211k)', '8.1 (1m)', '7.9 (1.2k)', '6.9 (120k)', '8.8 (668k)', '3.9 (606)', '8.4 (133k)', '7.7 (39k)', '6.6 (13k)', '7.5 (137k)', '8.0 (118k)', '8.3 (738k)', '7.7 (81k)', '9.1 (1m)', '8.2 (53k)', '8.7 (557k)', '8.3 (252k)', '7.5 (70k)', '7.2 (13k)', '7.7 (116k)', '6.0 (203)', '6.3 (5.7k)', '8.5 (26

## **Fetching Age Rating Details**

In [193]:
# Write Your Code here
import time
tvshow_agerating=[]

for url in tvshow_url:
  try:
     response=requests.get(url,headers=headers)
     soup=BeautifulSoup(response.text,'html.parser')
     for x in soup.find_all('div',class_='detail-infos'):
       if x.find_all('h3')[0].text=='Age rating':
        age_rating=x.find_all('div')[0].text
  except:
    runtime='NA'

  tvshow_agerating.append(age_rating)
  time.sleep(1)

print(len(tvshow_agerating))
print(tvshow_agerating)

100
['A', 'A', 'A', 'A', 'A', 'A', 'U', 'U', 'U', 'U', 'UA', 'UA', 'UA', 'UA', 'UA', 'UA', 'A', 'A', 'A', 'A', 'A', 'A', 'U', 'U', 'U', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'U', 'U', 'U', 'U', 'U', 'A', 'U', 'U', 'U', 'A', 'U', 'U', 'A', 'A', 'A', 'U', 'U', 'U', 'U', 'U', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'U', 'U', 'A', 'A', 'U']


## **Fetching Production Country details**

In [194]:
# Write Your Code here
import time
tvshow_productioncountries=[]

for url in tvshow_url:
  try:
     content=requests.get(url,headers=headers)
     soup=BeautifulSoup(content.text,'html.parser')
     for x in soup.find_all('div', class_='detail-infos'):
      if x.find_all('h3')[0].text == ' Production country ':
        production_country = x.find_all('div')[0].text
  except:
    production_country='NA'

  tvshow_productioncountries.append(production_country)
  time.sleep(1)

print(len(tvshow_productioncountries))
print(tvshow_productioncountries)

100
['India', 'United States', 'Japan', 'United States', 'India', 'South Korea', 'United States', 'India', 'India', 'India', 'Japan', 'United States', 'India', 'United States', 'India', 'Japan', 'Spain', 'United States', 'United States', 'United States', 'Turkey', 'India', 'United States', 'United States', 'Spain', 'India', 'India', 'India', 'India', 'India', 'Germany, United Kingdom', 'India', 'United States', 'United States', 'United States', 'India', 'United States', 'India', 'United States', 'India', 'Germany', 'Spain', 'United States', 'United States', 'India', 'United States', 'United States', 'Japan', 'Japan, United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'India', 'United States', 'United Kingdom', 'India', 'India', 'United States', 'Germany, Italy, United States', 'United States', 'United States', 'United States', 'United States', 'United Kingdom', 'United States', 'United States', 'United States', 'South Korea', 'United Sta

## **Fetching Streaming Service details**

In [238]:
# Write Your Code here
import time
tvshow_streaming_services=[]

for url in tvshow_url:
  try:
     response=requests.get(url,headers=headers)
     soup=BeautifulSoup(response.text,'html.parser')
     names=(x['alt'] for x in soup.find_all('img', class_="offer__icon"))
  except:
    names='NA'

  tvshow_streaming_services.append(" , ".join(names))
  time.sleep(1)

print(len(tvshow_streaming_services))
print(tvshow_streaming_services)

100
['Amazon Prime Video , Amazon Prime Video', 'Jio Cinema', 'Amazon Prime Video , Amazon Prime Video , Anime Times Amazon Channel', 'Amazon Prime Video , Amazon Prime Video', 'Zee5', 'Netflix', 'Jio Cinema', 'Amazon Prime Video , Amazon Prime Video', 'Jio Cinema , Alt Balaji', 'Alt Balaji', 'Amazon Prime Video , Amazon Prime Video , Anime Times Amazon Channel', 'Hotstar', 'Jio Cinema', 'Amazon Prime Video , Amazon Prime Video', 'Netflix', 'Netflix , Crunchyroll Amazon Channel', 'Netflix', 'Apple TV Plus , Apple TV+', 'Netflix', '', 'Netflix', '', 'Netflix , Lionsgate Play , Lionsgate Play Apple TV Channel , Lionsgate Play Amazon Channel', 'Hotstar', 'Netflix', 'Jio Cinema', '', 'Jio Cinema', 'MX Player', 'Sony Liv', 'Netflix', 'Amazon Prime Video , Amazon Prime Video', 'Netflix', 'Netflix', 'Netflix', 'Amazon Prime Video , Amazon Prime Video', '', 'Sony Liv', 'Jio Cinema', 'Sony Liv', 'Netflix', 'Netflix', 'Amazon Prime Video , Amazon Prime Video', 'Netflix', 'Alt Balaji', 'Jio Cinem

## **Fetching Duration Details**

In [239]:
# Write Your Code here
import time
tvshow_runtime = []

for url in tvshow_url:
  try:
    response = requests.get(url, headers=headers)
    soup=BeautifulSoup(response.text,'html.parser')
    for x in soup.find_all('div',class_='detail-infos'):
       if x.find_all('h3')[0].text=='Runtime':
        runtime=x.find_all('div')[0].text
  except:
    runtime='NA'

  tvshow_runtime.append(runtime)
  time.sleep(1)

print(len(tvshow_runtime))
print(tvshow_runtime)

100
['50min', '1h 0min', '3min', '1h 1min', '43min', '58min', '58min', '35min', '24min', '22min', '25min', '59min', '42min', '25min', '57min', '26min', '49min', '43min', '52min', '51min', '49min', '28min', '49min', '34min', '50min', '1h 28min', '1h 16min', '47min', '43min', '30min', '44min', '56min', '45min', '47min', '1h 1min', '45min', '43min', '31min', '49min', '52min', '56min', '46min', '59min', '43min', '44min', '38min', '48min', '24min', '24min', '45min', '46min', '1h 1min', '35min', '46min', '26min', '31min', '58min', '45min', '23min', '52min', '53min', '56min', '43min', '24min', '50min', '1h 28min', '47min', '58min', '58min', '1h 1min', '56min', '19min', '1h 4min', '55min', '54min', '21min', '44min', '50min', '45min', '55min', '43min', '50min', '20min', '43min', '44min', '1h 5min', '43min', '1h 1min', '21min', '32min', '1h 2min', '39min', '57min', '52min', '43min', '42min', '30min', '46min', '34min', '24min']


## **Creating TV Show DataFrame**

In [240]:
# Write Your Code here
info_dict = {
    'tvshow_link': tvshow_url,
    'tvshow_title': tvshow_title,
    'tvshow_year': tvshow_year,
    'tvshow_genre': tvshow_genre,
    'imdb_rating': tvshow_rating,
    'tvshow_runtime': tvshow_runtime,
    'tvshow_agerating': tvshow_agerating,
    'production_country': tvshow_productioncountries,
    'streaming_service': tvshow_streaming_services
}

tv_data=pd.DataFrame(info_dict)

In [241]:
tv_data

Unnamed: 0,tvshow_link,tvshow_title,tvshow_year,tvshow_genre,imdb_rating,tvshow_runtime,tvshow_agerating,production_country,streaming_service
0,https://www.justwatch.com/in/tv-show/mirzapur,Mirzapur (2018),2018,"Action & Adventure, Drama, Crime, Mystery & Th...",8.4 (87k),50min,A,India,"Amazon Prime Video , Amazon Prime Video"
1,https://www.justwatch.com/in/tv-show/house-of-...,House of the Dragon (2022),2022,"Action & Adventure, Science-Fiction, Drama, Fa...",8.4 (437k),1h 0min,A,United States,Jio Cinema
2,https://www.justwatch.com/in/tv-show/adams-swe...,Adam's Sweet Agony (2024),2024,Animation,,3min,A,Japan,"Amazon Prime Video , Amazon Prime Video , Anim..."
3,https://www.justwatch.com/in/tv-show/the-boys,The Boys (2019),2019,"Science-Fiction, Action & Adventure, Drama, Co...",8.7 (712k),1h 1min,A,United States,"Amazon Prime Video , Amazon Prime Video"
4,https://www.justwatch.com/in/tv-show/gyaarah-g...,Gyaarah Gyaarah (2024),2024,"Fantasy, Drama, Science-Fiction",8.3 (2.5k),43min,A,India,Zee5
...,...,...,...,...,...,...,...,...,...
95,https://www.justwatch.com/in/tv-show/the-vampi...,The Vampire Diaries (2009),2009,"Drama, Science-Fiction, Fantasy, Horror, Roman...",7.7 (355k),42min,U,United States,"Amazon Prime Video , Netflix , Amazon Prime Video"
96,https://www.justwatch.com/in/tv-show/ghosts,Ghosts (2019),2019,"Comedy, Fantasy, Science-Fiction",8.4 (28k),30min,U,United Kingdom,"Amazon Prime Video , BBC Player Amazon Channel..."
97,https://www.justwatch.com/in/tv-show/bad-monkey,Bad Monkey (2024),2024,"Comedy, Drama, Crime",7.3 (1.8k),46min,A,United States,"Apple TV Plus , Apple TV+"
98,https://www.justwatch.com/in/tv-show/dark-desire,Dark Desire (2020),2020,"Mystery & Thriller, Drama",6.5 (9.5k),34min,A,Mexico,Netflix


In [242]:
# make a csv file
tv_data.to_csv('tv_data.csv')

## **Task 2 :- Data Filtering & Analysis**

In [260]:
# Write Your Code here

# first make copies of data files where we do manipulation
movie_data1= movie_data.copy()
tv_data1= tv_data.copy()

In [261]:
movie_data1

Unnamed: 0,movie_link,movie_title,movie_year,movie_genre,imdb_rating,movie_runtime,movie_agerating,production_country,streaming_service
0,https://www.justwatch.com/in/movie/kill-2024,Kill (2024),2024,"Drama, Mystery & Thriller, Action & Adventure,...",7.7 (19k),1h 45min,A,"United States, India",Apple TV+
1,https://www.justwatch.com/in/movie/munjha,Munjya (2024),2024,"Horror, Comedy",7.2 (15k),2h 3min,A,India,Apple TV+
2,https://www.justwatch.com/in/movie/maharaja-2024,Maharaja (2024),2024,"Mystery & Thriller, Action & Adventure, Crime,...",8.6 (39k),2h 30min,A,India,"Netflix , Bookmyshow , Apple TV+"
3,https://www.justwatch.com/in/movie/project-k,Kalki 2898-AD (2024),2024,"Drama, Fantasy, Science-Fiction, Mystery & Thr...",7.6 (43k),3h 1min,UA,India,"Amazon Prime Video , Amazon Prime Video , Netf..."
4,https://www.justwatch.com/in/movie/deadpool-3,Deadpool & Wolverine (2024),2024,"Action & Adventure, Comedy, Science-Fiction",8.0 (233k),2h 8min,A,United States,"Bookmyshow , Apple TV+"
...,...,...,...,...,...,...,...,...,...
95,https://www.justwatch.com/in/movie/777-charlie,777 Charlie (2022),2022,"Comedy, Drama, Action & Adventure",8.7 (41k),2h 46min,UA,India,"Amazon Prime Video , Apple TV , Amazon Prime V..."
96,https://www.justwatch.com/in/movie/memories-of...,Memories of Murder (2003),2003,"Crime, Drama, Mystery & Thriller",8.1 (222k),2h 11min,UA,South Korea,Apple TV+
97,https://www.justwatch.com/in/movie/kingsman-th...,Kingsman: The Secret Service (2014),2014,"Action & Adventure, Crime, Comedy, Mystery & T...",7.7 (726k),2h 9min,A,"United Kingdom, United States","Apple TV , Hotstar , Netflix , Apple TV+ , Ama..."
98,https://www.justwatch.com/in/movie/guruvayoor-...,Guruvayoor Ambalanadayil (2024),2024,Comedy,6.6 (2.8k),2h 12min,A,India,"Hotstar , Bookmyshow , Apple TV+"


In [262]:
tv_data1

Unnamed: 0,tvshow_link,tvshow_title,tvshow_year,tvshow_genre,imdb_rating,tvshow_runtime,tvshow_agerating,production_country,streaming_service
0,https://www.justwatch.com/in/tv-show/mirzapur,Mirzapur (2018),2018,"Action & Adventure, Drama, Crime, Mystery & Th...",8.4 (87k),50min,A,India,"Amazon Prime Video , Amazon Prime Video"
1,https://www.justwatch.com/in/tv-show/house-of-...,House of the Dragon (2022),2022,"Action & Adventure, Science-Fiction, Drama, Fa...",8.4 (437k),1h 0min,A,United States,Jio Cinema
2,https://www.justwatch.com/in/tv-show/adams-swe...,Adam's Sweet Agony (2024),2024,Animation,,3min,A,Japan,"Amazon Prime Video , Amazon Prime Video , Anim..."
3,https://www.justwatch.com/in/tv-show/the-boys,The Boys (2019),2019,"Science-Fiction, Action & Adventure, Drama, Co...",8.7 (712k),1h 1min,A,United States,"Amazon Prime Video , Amazon Prime Video"
4,https://www.justwatch.com/in/tv-show/gyaarah-g...,Gyaarah Gyaarah (2024),2024,"Fantasy, Drama, Science-Fiction",8.3 (2.5k),43min,A,India,Zee5
...,...,...,...,...,...,...,...,...,...
95,https://www.justwatch.com/in/tv-show/the-vampi...,The Vampire Diaries (2009),2009,"Drama, Science-Fiction, Fantasy, Horror, Roman...",7.7 (355k),42min,U,United States,"Amazon Prime Video , Netflix , Amazon Prime Video"
96,https://www.justwatch.com/in/tv-show/ghosts,Ghosts (2019),2019,"Comedy, Fantasy, Science-Fiction",8.4 (28k),30min,U,United Kingdom,"Amazon Prime Video , BBC Player Amazon Channel..."
97,https://www.justwatch.com/in/tv-show/bad-monkey,Bad Monkey (2024),2024,"Comedy, Drama, Crime",7.3 (1.8k),46min,A,United States,"Apple TV Plus , Apple TV+"
98,https://www.justwatch.com/in/tv-show/dark-desire,Dark Desire (2020),2020,"Mystery & Thriller, Drama",6.5 (9.5k),34min,A,Mexico,Netflix


In [263]:
# basic check row and column

print(movie_data1.shape)
print(tv_data1.shape)

(100, 9)
(100, 9)


In [264]:
# checking null value
movie_data1.isnull().sum()

Unnamed: 0,0
movie_link,0
movie_title,0
movie_year,0
movie_genre,0
imdb_rating,0
movie_runtime,0
movie_agerating,0
production_country,0
streaming_service,0


In [248]:
tv_data1.isnull().sum()

Unnamed: 0,0
tvshow_link,0
tvshow_title,0
tvshow_year,0
tvshow_genre,0
imdb_rating,0
tvshow_runtime,0
tvshow_agerating,0
production_country,0
streaming_service,0


In [249]:
# lets see memory consumtion
movie_data1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   movie_link          100 non-null    object
 1   movie_title         100 non-null    object
 2   movie_year          100 non-null    object
 3   movie_genre         100 non-null    object
 4   imdb_rating         100 non-null    object
 5   movie_runtime       100 non-null    object
 6   movie_agerating     100 non-null    object
 7   production_country  100 non-null    object
 8   streaming_service   100 non-null    object
dtypes: object(9)
memory usage: 7.2+ KB


In [250]:
tv_data1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   tvshow_link         100 non-null    object
 1   tvshow_title        100 non-null    object
 2   tvshow_year         100 non-null    object
 3   tvshow_genre        100 non-null    object
 4   imdb_rating         100 non-null    object
 5   tvshow_runtime      100 non-null    object
 6   tvshow_agerating    100 non-null    object
 7   production_country  100 non-null    object
 8   streaming_service   100 non-null    object
dtypes: object(9)
memory usage: 7.2+ KB


In [265]:
# statistical report
movie_data1.describe()

Unnamed: 0,movie_link,movie_title,movie_year,movie_genre,imdb_rating,movie_runtime,movie_agerating,production_country,streaming_service
count,100,100,100,100,100,100,100,100,100
unique,100,100,18,80,99,60,3,16,53
top,https://www.justwatch.com/in/movie/kill-2024,Kill (2024),2024,Drama,7.1 (50k),2h 30min,UA,India,"Netflix , Apple TV+"
freq,1,1,60,6,2,5,54,55,12


In [254]:
tv_data1.describe(include='object')

Unnamed: 0,tvshow_link,tvshow_title,tvshow_year,tvshow_genre,imdb_rating,tvshow_runtime,tvshow_agerating,production_country,streaming_service
count,100,100,100,100,100.0,100,100,100,100
unique,100,100,20,69,99.0,42,3,15,28
top,https://www.justwatch.com/in/tv-show/mirzapur,Mirzapur (2018),2024,Drama,,43min,A,United States,Netflix
freq,1,1,20,6,2.0,10,58,47,28


## **Calculating Mean IMDB Ratings for both Movies and Tv Shows**

In [201]:
# Write Your Code here


## **Analyzing Top Genres**

In [202]:
# Write Your Code here


In [203]:
#Let's Visvalize it using word cloud


## **Finding Predominant Streaming Service**

In [204]:
# Write Your Code here


In [205]:
#Let's Visvalize it using word cloud


## **Task 3 :- Data Export**

In [206]:
#saving final dataframe as Final Data in csv format


In [207]:
#saving filter data as Filter Data in csv format


# **Dataset Drive Link (View Access with Anyone) -**

# ***Congratulations!!! You have completed your Assignment.***