**Scraping episode metadata**

I found a Netflix blog post that has information on every sketch in every episode of *I Think You Should Leave*.  The time stamps will be super useful since I'll be doing sentiment analysis on each sketch (on screenshots and the transcripts, which I found on a fan site).

The HTML/CSS structure of this page does not have a useful hierarchy so I exported the scraped text to clean up and assemble in an external CSV editor (see `episode_metadata.csv`).  I manually added columns for season and episode numbers.  I'll also assign my own thematic category to each sketch by watching every episode again.  

In [1]:
from bs4 import BeautifulSoup
import lxml
import os
import requests
import csv

In [None]:
# use beautiful soup to scrape netflix blog post for episode names and time stamps
url = "https://www.netflix.com/tudum/articles/i-think-you-should-leave-sketch-names"
r = requests.get(url)

In [None]:
# get html content and pass into soup object
source = r.content
soup = BeautifulSoup(source, 'lxml')

In [None]:
# season headers
# these are all the same and there are no divs separating content per season, oh well
soup.find_all('h3', class_='css-2gvxvk')

[<h3 class="css-2gvxvk" data-sel="heading">Season 1</h3>,
 <h3 class="css-2gvxvk" data-sel="heading">Season 2</h3>,
 <h3 class="css-2gvxvk" data-sel="heading">Season  3</h3>]

In [None]:
# episode numbers and names
episode_names = soup.find_all('h3', class_='css-yw72u0')
episode_text = []
for i in episode_names:
    episode_text.append(i.text)
episode_text

['Episode 1:\xa0“Has This Ever Happened to You?”',
 'Episode 2:\xa0“Thanks for Thinking They Are Cool”',
 'Episode 3:\xa0“It’s the Cigars You Smoke That Are Going to Give You Cancer”',
 'Episode 4:\xa0“Oh Crap, a Bunch More Bad Stuff Just Happened”',
 'Episode 5:\xa0“I’m Wearing One of Their Belts Right Now”',
 'Episode 6:\xa0“We Used to Watch This at My Old Work”',
 'Episode 1:\xa0“They said that to me at a dinner.”\xa0',
 'Episode 2:\xa0“They have a cake shop there, Susan, where the cakes just look stunning.”',
 'Episode 3:\xa0“You sure about that? You sure about that, that’s why?”',
 'Episode 4:\xa0“Everyone just needs to be more in the moment.”',
 'Episode 5:\xa0“Didn’t you say there was gonna be five people at this table?”',
 'Episode 6:\xa0“I need a wet paper towel.”',
 'Episode 1: “That was the Earth telling me I’m supposed to be doing something great.”\xa0',
 'Episode 2: “I can do whatever I want.”',
 'Episode 3: “Cut to: We’re chatting about this at your bachelor party.”',
 '\

In [18]:
# export episode_text as csv
with open('episode_metadata.csv', 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    for line in episode_text:
        writer.writerow([line])


In [None]:
# get the names of all the sketches
sketch_names = soup.find_all('h3', class_='css-1i31ble')
sketch_text = []
for i in sketch_names:
    sketch_text.append(i.text)
sketch_text

['\xa0',
 '“Both Ways”',
 '“Has This Ever Happened to You?”',
 '“Baby of the Year”',
 '“Instagram”',
 '“Gift Receipt”',
 '\xa0',
 '“Biker Guy”',
 '“River Mountain High”',
 '“Wilson’s Toupees”',
 '“Pink Bag”',
 '“River Mountain High”',
 '“The Man”',
 '\xa0',
 '“Which Hand”',
 '“Focus Group”',
 '“Laser Spine Specialists”',
 '“New Joe”',
 '“Game Night”',
 '\xa0',
 '“Lifetime Achievement”',
 '“A Christmas Carol”',
 '“Nachos”',
 '“Traffic”',
 '\xa0',
 '“Brooks Brothers”',
 '“Choking”',
 '“New Printer”',
 '“The Day Robert Palins Murdered Me”',
 '“Babysitter”',
 '\xa0',
 '“Fenton’s Stables and Horse Farm”',
 '“Chunky”',
 '“Bozo”',
 '“Baby Shower”',
 '“Bozo”',
 '“Party House”',
 '“H.D. Vac Part II”',
 '“Corncob TV”',
 '“Prank Show”',
 '“Little Buff Boys”',
 '“Ghost Tour”',
 '\xa0',
 '“The Capital Room”',
 '“Dan Flashes”',
 '“Diner Wink”',
 '“The Shops at the Creeks”',
 '“Baby Cries”',
 '\xa0',
 '“Grambles Lorelei Lounge”',
 '“Crashmore – Trailer”',
 '“H.D. Vac Commercial”',
 '“Crashmore – Junk

In [19]:
# append sketch names to episode_metadata csv
with open('episode_metadata.csv', 'a', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    for line in sketch_text:
        writer.writerow([line])


In [None]:
# time stamps
time_stamps =soup.find_all('div', class_='css-1dufl7l')
time_stamps_text = []
for i in time_stamps:
    time_stamps_text.append(i.text)
time_stamps_text


['Time stamp: 00:00',
 'Time stamp: 01:40',
 'Time stamp: 03:27',
 'Time stamp: 07:25',
 'Time stamp: 09:25',
 'Time stamp: 00:00',
 'Time stamp: 02:15',
 'Time stamp: 04:38',
 'Time stamp: 06:23',
 'Time stamp: 09:27',
 'Time stamp: 10:44',
 'Time stamp: 00:00',
 'Time stamp: 03:50',
 'Time stamp: 06:50',
 'Time stamp: 09:32',
 'Time stamp: 11:38',
 'Time stamp: 00:33',
 'Time stamp: 03:45',
 'Time stamp: 06:05',
 'Time stamp: 09:59',
 'Time stamp: 00:00',
 'Time stamp: 03:12',
 'Time stamp: 06:13',
 'Time stamp: 08:42',
 'Time stamp: 12:06',
 'Time stamp: 00:42',
 'Time stamp: 02:17',
 'Time stamp: 06:07',
 'Time stamp: 08:14',
 'Time stamp: 11:40',
 'Time stamp: 13:50',
 'Time stamp: 00:00',
 'Time stamp: 02:50',
 'Time stamp: 04:42',
 'Time stamp: 08:08',
 'Time stamp: 11:59',
 'Time stamp: 00:00',
 'Time stamp: 02:00',
 'Time stamp: 05:35',
 'Time stamp: 09:03',
 'Time stamp: 10:20',
 'Time stamp: 00:00',
 'Time stamp: 04:01',
 'Time stamp: 05:55',
 'Time \xa0stamp: 07:28',
 'Time

In [20]:
# append time stamps to episode_metadata csv
with open('episode_metadata.csv', 'a', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    for line in time_stamps_text:
        writer.writerow([line])

In [None]:
# get episode descriptions, all the p elements inside div with class 'css-6ywxh8'
descriptions = soup.select('div.css-6ywxh8 p')
description_text = []
for i in descriptions:
    description_text.append(i.text)
description_text

['I Think You Should Leave with Tim Robinson\xa0humbly asks: What if the awkward, everyday scenarios in your life lasted a little too long and made absolutely no sense?\xa0',
 'The Emmy–winning sketch comedy series comes, naturally, from Tim Robinson, who cut his teeth as a performer and writer on\xa0Saturday Night Live and created\xa0Detroiters\xa0alongside Sam Richardson (Veep). Robinson is both creator (with Zach Kanin) and star here, joined by a highly game group of comic actors, from old pal Richardson to Bob Odenkirk with a creepy wink. Together, they learn that magicians suck, observe real people flopping out of coffins and enjoy sloppy steaks.\xa0',
 'You no doubt have your own favorite sketches among the mix, whether it’s the one about the shirts with the complicated patterns or that other one with the guy who has a great idea for a car with a steering wheel that doesn’t fly off while you’re driving. But alas, not every night is a\xa0Friday night, and you sometimes don’t have 

In [21]:
# append descriptions to episode_metadata csv
with open('episode_metadata.csv', 'a', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    for line in description_text:
        writer.writerow([line])