Write a program to download the data from the given API link and then extract the following data with proper formatting

Link - http://api.tvmaze.com/singlesearch/shows?q=westworld&embed=episodes

Note - Write proper code comments wherever needed for the code understanding

Excepted Output Data Attributes -
- id - int 
- url - string
- name - string 
- season - int 
- number - int
- type - string 
- airdate - date format 
- airtime - 12-hour time format
- runtime - float
- average rating - float
- summary - string without html tags
- medium image link - string
- Original image link - string

In [20]:
import requests as req
from datetime import datetime
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup

In [25]:
url = "http://api.tvmaze.com/singlesearch/shows?q=westworld&embed=episodes"
resp = req.get(url)
data_json = resp.json()

In [26]:
data_json

{'id': 1371,
 'url': 'https://www.tvmaze.com/shows/1371/westworld',
 'name': 'Westworld',
 'type': 'Scripted',
 'language': 'English',
 'genres': ['Drama', 'Science-Fiction', 'Western'],
 'status': 'Ended',
 'runtime': 60,
 'averageRuntime': 63,
 'premiered': '2016-10-02',
 'ended': '2022-08-14',
 'officialSite': 'http://www.hbo.com/westworld',
 'schedule': {'time': '21:00', 'days': ['Sunday']},
 'rating': {'average': 8.2},
 'weight': 99,
 'network': {'id': 8,
  'name': 'HBO',
  'country': {'name': 'United States',
   'code': 'US',
   'timezone': 'America/New_York'},
  'officialSite': 'https://www.hbo.com/'},
 'webChannel': None,
 'dvdCountry': None,
 'externals': {'tvrage': 37537, 'thetvdb': 296762, 'imdb': 'tt0475784'},
 'image': {'medium': 'https://static.tvmaze.com/uploads/images/medium_portrait/445/1113927.jpg',
  'original': 'https://static.tvmaze.com/uploads/images/original_untouched/445/1113927.jpg'},
 'summary': '<p><b>Westworld</b> is a dark odyssey about the dawn of artifici

In [27]:
# Create lists to store the extracted attributes
ids = []
urls = []
names = []
seasons = []
numbers = []
types = []
airdates = []
airtimes = []
runtimes = []
average_ratings = []
summaries = []
medium_images = []
original_images = []

In [28]:
# Iterate over each data in the "episodes" list
for data in data_json["_embedded"]["episodes"]:
    # Extract the desired attributes from each episode
    ids.append(data["id"])
    urls.append(data["url"])
    names.append(data["name"])
    seasons.append(data["season"])
    numbers.append(data["number"])
    types.append(data["type"])
    airdates.append((datetime.strptime(data["airdate"], "%Y-%m-%d")).strftime("%d-%m-%Y"))
    airtimes.append((datetime.strptime(data["airtime"], "%H:%M")).strftime("%I:%M %p"))
    runtimes.append(data["runtime"])
    average_ratings.append(data["rating"]["average"])
    summaries.append(BeautifulSoup(data["summary"], "html.parser").get_text())
    medium_images.append(data["image"]["medium"])
    original_images.append(data["image"]["original"])

In [29]:
# Create a DataFrame from the extracted attributes
df = pd.DataFrame({
    "id": ids,
    "url": urls,
    "name": names,
    "season": seasons,
    "number": numbers,
    "type": types,
    "airdate": airdates,
    "airtime": airtimes,
    "runtime": runtimes,
    "average_rating": average_ratings,
    "summary": summaries,
    "medium_image_link": medium_images,
    "original_image_link": original_images
})

In [30]:
df.head()

Unnamed: 0,id,url,name,season,number,type,airdate,airtime,runtime,average_rating,summary,medium_image_link,original_image_link
0,869671,https://www.tvmaze.com/episodes/869671/westwor...,The Original,1,1,regular,02-10-2016,09:00 PM,68,8.0,A woman named Dolores is a free spirit in the ...,https://static.tvmaze.com/uploads/images/mediu...,https://static.tvmaze.com/uploads/images/origi...
1,911201,https://www.tvmaze.com/episodes/911201/westwor...,Chestnut,1,2,regular,09-10-2016,09:00 PM,60,7.7,Bernard suspects that someone is sabotaging th...,https://static.tvmaze.com/uploads/images/mediu...,https://static.tvmaze.com/uploads/images/origi...
2,911204,https://www.tvmaze.com/episodes/911204/westwor...,The Stray,1,3,regular,16-10-2016,09:00 PM,60,7.6,Bernard continues to investigate Dolores' supp...,https://static.tvmaze.com/uploads/images/mediu...,https://static.tvmaze.com/uploads/images/origi...
3,911205,https://www.tvmaze.com/episodes/911205/westwor...,Dissonance Theory,1,4,regular,23-10-2016,09:00 PM,60,7.9,While Dolores joins William and Logan on their...,https://static.tvmaze.com/uploads/images/mediu...,https://static.tvmaze.com/uploads/images/origi...
4,927174,https://www.tvmaze.com/episodes/927174/westwor...,Contrapasso,1,5,regular,30-10-2016,09:00 PM,60,8.0,Dolores takes the first step on her path of di...,https://static.tvmaze.com/uploads/images/mediu...,https://static.tvmaze.com/uploads/images/origi...


In [31]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 13 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   id                   36 non-null     int64  
 1   url                  36 non-null     object 
 2   name                 36 non-null     object 
 3   season               36 non-null     int64  
 4   number               36 non-null     int64  
 5   type                 36 non-null     object 
 6   airdate              36 non-null     object 
 7   airtime              36 non-null     object 
 8   runtime              36 non-null     int64  
 9   average_rating       36 non-null     float64
 10  summary              36 non-null     object 
 11  medium_image_link    36 non-null     object 
 12  original_image_link  36 non-null     object 
dtypes: float64(1), int64(4), object(8)
memory usage: 3.8+ KB


In [32]:
df['runtime'] = df['runtime'].astype(float)

In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 13 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   id                   36 non-null     int64  
 1   url                  36 non-null     object 
 2   name                 36 non-null     object 
 3   season               36 non-null     int64  
 4   number               36 non-null     int64  
 5   type                 36 non-null     object 
 6   airdate              36 non-null     object 
 7   airtime              36 non-null     object 
 8   runtime              36 non-null     float64
 9   average_rating       36 non-null     float64
 10  summary              36 non-null     object 
 11  medium_image_link    36 non-null     object 
 12  original_image_link  36 non-null     object 
dtypes: float64(2), int64(3), object(8)
memory usage: 3.8+ KB


In [34]:
# Save the DataFrame to a CSV file
df.to_csv("episodes_data.csv", index=False)

print("Data saved as episodes_data.csv")

Data saved as episodes_data.csv
