# Netflix TV Show Cancellation Analysis - Step 1: Data Loading and Inspection
This notebook includes our data wrangling process, and the first step focuses on loading and inspecting the datasets that will allow us to analyze Netflix TV show cancellations in 2023.The datasets (under the data folder) include:
-  `cancelled_neflix_shows.csv`: Information about canceled Netflix TV shows, including their `Year`,`Title`, `Network`, and `Genre`. The .csv file was obtained using the provided scraping script from Kaggle, and was only modified to scrape only Netlix shows instead of all shows. The script can be found in `scraper.py`.
-  `What_We_Watched_A_Netflix_Engagement_Report_2023Jan-Jun.xlsx` and `What_We_Watched_A_Netflix_Engagement_Report_2023Jul-Dec.xlsx`: Netflix enagagement reports containing viewing hours for TV and movies. 
-  `title.basics.tsv`: IMDb dataset containing metadata for all titles.

#### Step 0: Install libraries

In [1]:
#!pip install pandas
#!pip install numpy
#!pip install openpyxl

#### Step 1: Import Libraries and set options

In [1]:
#Import necessary libraries 
import pandas as pd #For data manipulation and analysis
import numpy as np #For numerical operations 
from datetime import datetime #For handling date columns

pd.set_option("display.max_rows", 20) #limit the number of rows displayed when printing a DataFrame

#### Step 2: Load datsets
##### Step 2.1: Load the canceled Netflix TV shows Dataset

In [4]:
# Load the cancelled Netflix TV shows dataset 
cancelled_netflix_shows_path = '../data/cancelled_netflix_shows.csv'
cancelled_netflix_shows = pd.read_csv(cancelled_netflix_shows_path)

Step 2.1.1: Inspect DataFrame of the cancelled Netflix TV shows

In [None]:
print("Canceled Netflix TV Shows DataFrame:")
cancelled_netflix_shows

Canceled Netflix TV Shows DataFrame:


Unnamed: 0,Title,Year,Network,Genre
0,Girls5eva,2021 - 2024,Netflix,Comedy
1,Unstable,2023 - 2024,Netflix,Comedy
2,KAOS,2024,Netflix,Drama / Fantasy
3,That '90s Show,2023 - 2024,Netflix,Comedy
4,Buying London,2024,Netflix,Reality
...,...,...,...,...
259,The Politician,2019 - 2020,Netflix,Drama / Comedy
260,Dancing Queen,2018,Netflix,Reality
261,"Boo, Bitch",2022,Netflix,Comedy
262,The Order,2019 - 2020,Netflix,Drama / Horror


##### Step 2.2: Load the Netflix engagement reports

In [None]:
# Load the Netflix engagement reports for Jan-Jun and Jul-Dec 2023.
engagement_1H2023_path = '../data/What_We_Watched_A_Netflix_Engagement_Report_2023Jan-Jun.xlsx'
engagement_2H2023_path = '../data/What_We_Watched_A_Netflix_Engagement_Report_2023Jul-Dec.xlsx'

#Define which columns to load from the reports 
cols = "B:E" #Columns B to E include the relevant data: Title, Release Date, Hours Viewed

#Load the Jan-Jun 2023 report 
#This report contains a combined sheet for TV and films, so we will need to filter out films later. 
engagement_1H2023 = pd.read_excel(
    engagement_1H2023_path, 
    # no separate sheets for TV and movies here!
    header=5, #Skip the first 5 rows, as they contain metadata unrelated to the file.
    usecols=cols #Load only the relevant columns. 
)

#Load the Jul-Dec 2023 report 
#This report has a separate "TV" tab, so we can load it directly without needing to filter for TV content. 
engagement_2H2023 = pd.read_excel(
    engagement_2H2023_path, 
    sheet_name='TV', #Specify the "TV" tab to exclude film data. 
    header=5, #Skip the first 5 rows, as they contain metadata unrelated to the file.
    usecols=cols #Load only the relevant columns.
)

Step 2.2.1: Display DataFrame for Janurary-June 2023

In [None]:
print("Netflix Engagement Report (Jan-Jun 2023):")
engagement_1H2023 

Netflix Engagement Report (Jan-Jun 2023):


Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed
0,The Night Agent: Season 1,Yes,2023-03-23,812100000
1,Ginny & Georgia: Season 2,Yes,2023-01-05,665100000
2,The Glory: Season 1 // 더 글로리: 시즌 1,Yes,2022-12-30,622800000
3,Wednesday: Season 1,Yes,2022-11-23,507700000
4,Queen Charlotte: A Bridgerton Story,Yes,2023-05-04,503000000
...,...,...,...,...
18209,راس السنة,No,NaT,100000
18210,心が叫びたがってるんだ。,No,NaT,100000
18211,두근두근 내 인생,No,NaT,100000
18212,라디오 스타,No,NaT,100000


Step 2.2.2: Display DataFrame for July-December 2023

In [None]:
print("Netflix Engagement Report: TV (Jul-Dec 2023):")
engagement_2H2023

Netflix Engagement Report: TV (Jul-Dec 2023):


Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed
0,ONE PIECE: Season 1,Yes,2023-08-31,541900000
1,Dear Child: Limited Series // Liebes Kind: Min...,Yes,2023-09-07,252800000
2,Who is Erin Carter?: Limited Series,Yes,2023-08-24,286200000
3,Lupin: Part 3,Yes,2023-10-05,274300000
4,The Witcher: Season 3,Yes,2023-06-29,363800000
...,...,...,...,...
6594,We Are Black and British: Season 1,No,,100000
6595,Whitney Cummings: Can I Touch It?,Yes,2019-07-30,100000
6596,Whitney Cummings: Jokes,No,2022-07-26,100000
6597,"Whose Vote Counts, Explained: Limited Series",Yes,2020-09-28,100000


#### Step 3: Inspect datsets
##### Step 3.1: Inspect the cancelled Netflix shows dataset

In [10]:
#Check the structure:
print("Cancelled Netflix TV shows dataset info:")
print(cancelled_netflix_shows.info())

#Check for missing values in the dataset:
print("Missing values in Cancelled Netflix TV shows dataset:")
print(cancelled_netflix_shows.isnull().sum())

Cancelled Netflix TV shows dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 264 entries, 0 to 263
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Title    264 non-null    object
 1   Year     264 non-null    object
 2   Network  264 non-null    object
 3   Genre    263 non-null    object
dtypes: object(4)
memory usage: 8.4+ KB
None
Missing values in Cancelled Netflix TV shows dataset:
Title      0
Year       0
Network    0
Genre      1
dtype: int64


<b>Step 3.1.1: Observations for cancelled netflix TV shows dataset</b>

1. Data Types:
    - All columns are of type `object`, including the `Year` column, which represents the runtime of the show. This column will need to be processed further to:
        - Distinguish between single-year and multi-year entries (e.g., "2023" vs. "2021-2023")
        - Extract the last year for analysis purposes (to identify shows canceled in 2023)

2. Missing Data:
    - There is 1 missing value in the `Genre` column. Since `Genre` is not critical for this specific project (our focus is on `Title` and `Year`), we can safely drop this column without it affecting our analysis. 

3. Data Quality: 
    - The `Title` column looks complete and will be crucial for merging this dataset with the Netflix reports datasets. 

##### Step 3.2: Inspect the Netflix engagement reports dataset (1/2)

In [11]:
#Check the structure for the Jan-Jun 2023 report:
print("Netflix Engagement Report (Jan-Jun 2023) Info:")
print(engagement_1H2023.info())

# Check for missing values in the Jan-Jun report
print("Missing Values in Netflix Engagement Report (Jan-Jun 2023):")
print(engagement_1H2023.isnull().sum())

Netflix Engagement Report (Jan-Jun 2023) Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18214 entries, 0 to 18213
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   Title                18214 non-null  object        
 1   Available Globally?  18214 non-null  object        
 2   Release Date         4855 non-null   datetime64[ns]
 3   Hours Viewed         18214 non-null  int64         
dtypes: datetime64[ns](1), int64(1), object(2)
memory usage: 569.3+ KB
None
Missing Values in Netflix Engagement Report (Jan-Jun 2023):
Title                      0
Available Globally?        0
Release Date           13359
Hours Viewed               0
dtype: int64


<b>Step 3.2.1: Observations for netflix engagement report (Jan-Jun 2023)</b>

1. Missing `Release Date`:
    - The missing values in `Release Date` may indicate that some titles lack specific release information.

2. TV vs. Movies:
    - The dataset includes both TV shows and movies, which will require filtering based on criteria such as the presence of "Season" in the `Title` column or matching with the IMDb dataset (for non-cancelled shows).

##### Step 3.2: Inspect the Netflix engagement reports dataset (2/2)

In [12]:
#Check the structure for the Jul-Dec 2023 TV report:
print("Netflix Engagement Report (Jul-Dec 2023: TV) Info:")
print(engagement_2H2023.info())

# Check for missing values in the Jul-Dec TV report:
print("Missing Values in Netflix Engagement Report (Jul-Dec 2023: TV):")
print(engagement_2H2023.isnull().sum())

Netflix Engagement Report (Jul-Dec 2023: TV) Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6599 entries, 0 to 6598
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Title                6599 non-null   object
 1   Available Globally?  6599 non-null   object
 2   Release Date         3312 non-null   object
 3   Hours Viewed         6599 non-null   int64 
dtypes: int64(1), object(3)
memory usage: 206.3+ KB
None
Missing Values in Netflix Engagement Report (Jul-Dec 2023: TV):
Title                     0
Available Globally?       0
Release Date           3287
Hours Viewed              0
dtype: int64


<b>Step 3.2.2: Observations for netflix engagement report (Jul-Dec 2023)</b>

1. Missing `Release Date`:
    -  Similar to the Jan-Jun dataset.

2. No Filtering for TV Shows:
    - Because this dataset is already limited to TV shows, no additional filtering by type is needed.

##### Step 3.3: Inspect the IMDb dataset

In [None]:
#Check the structure and data types in the IMDb dataset
print("IMDb Dataset Info:")
print(imdb_data.info())

#Check for missing values in the IMDb dataset
print("Missing Values in IMDb Dataset:")
print(imdb_data.isnull().sum())

<b>Step 3.3.1: Observations for IMDb dataset</b>

1. Purpose of the IMDb Dataset:
    -  This dataset will be used only for identifying non-canceled TV shows.
    - Relevant fields include:
        - `titleType`: To filter for TV series (`titleType = 'tvseries'`).
        - `startYear` and `endYear`: To identify Netflix TV shows that were active in 2023 and werent cancelled (based on match with Netflix engagement reports).
        - `primaryTitle`: To match titles with the Netflix engagement report datasets.

2. No Filtering for TV Shows:
    - Missing values in `primaryTitle` and `originalTitle` (19 entries) are likely negligible and can be dropped. (????)
    - Missing values in `genres` are not critical since this column is not required for our project.

---

# Netflix TV Show Cancellation Analysis - Step 2: Data Preparation
After inspecting the datasets in Step 1, the focus of Step 2 is on data preparation, including normalization, filtering, and merging tasks that will create a cleaned dataset for analysis.

## Step 1: Merge netflix engagement reports
To ensure we have complete data for Netflix TV shows in 2023, we merge the Jan-Jun and Jul-Dec Netflix engagement reports. Only titles that appear in <b>both reports</b> will be retained, ensuring full-year data for each title.

In [14]:
#Retain only relevant columns from each report 
#Extract the 'Title' and 'Hours Viewed' columns from each report for simplicity:
engagement_1H2023_trimmed = engagement_1H2023[['Title', 'Hours Viewed']]
engagement_2H2023_trimmed = engagement_2H2023[['Title', 'Hours Viewed']]

#Merge the two reports on 'Title' (inner join to retain only overlapping titles)
merged_engagement = pd.merge(
    engagement_1H2023_trimmed,
    engagement_2H2023_trimmed, 
    on='Title',  #Join on the 'Title' column
    suffixes=('_1H2023', '_2H2023') # Add suffixes to distinguish columns
)

#Sum the hours viewed across both halves of the year for each title
merged_engagement['Total Hours Viewed in 2023'] = (
    merged_engagement['Hours Viewed_1H2023'] + merged_engagement['Hours Viewed_2H2023']
)

#Keep only the 'Title' and 'Total Hours Viewed in 2023' columns for further analysis
merged_engagement = merged_engagement[['Title', 'Total Hours Viewed in 2023']]

#Inspect the final merged dataset
print("Total hours viewed in 2023 of Netflix TV shows:")
merged_engagement

Total hours viewed in 2023 of Netflix TV shows:


Unnamed: 0,Title,Total Hours Viewed in 2023
0,The Night Agent: Season 1,967600000
1,Ginny & Georgia: Season 2,731300000
2,The Glory: Season 1 // 더 글로리: 시즌 1,689700000
3,Wednesday: Season 1,670400000
4,Queen Charlotte: A Bridgerton Story,580600000
...,...,...
5649,Whitney Cummings: Can I Touch It?,200000
5650,Whitney Cummings: Jokes,200000
5651,"Whose Vote Counts, Explained: Limited Series",200000
5652,Yellow Muzi & Friends: Season 1 // 내 마음은 무지,500000


## Step 2: Filter last seasons in the merged Netflix engagement report
In this step, we clean the merged engagement report to retain only:
1. <b>Single-season shows:</b> Titles that only appear once in the dataset (no duplicates).
2. <b>The last season for multi-season shows:</b> Titles that have entries for multiple seasons.

To do this:
- Identifying standalone shows (no ":Season" in the title) and retaining them as-is.
- For multi-season shows:
    - Extracting the season number using Regex.
    - Keeping only the entry for the last season.
- Removing season information from titles to retain only the show name.

In [15]:
#1. Identify standalone shows (no ': Season ' in the title)
#A standalone show is one that doesn't explicitly mention a season in its title
#For example, "Queen Charlotte: A Bridgerton Story" is standalone, 
#while "Wednesday: Season 1" is a multi-season show.

# Create a mask for titles that do NOT contain 
# '(Season|Series|Volume|Book|Part|Chapter|Temporada|Collection) [number]'
standalone_mask = ~merged_engagement['Title'].str.contains(
    r": (Season|Series|Volume|Book|Part|Chapter|Temporada|Collection) \d+", 
    regex=True, 
    na=False
)
#Separate standalone shows:
standalone_shows = merged_engagement[standalone_mask].copy()

# Print intermediate results to verify correctness
print(f"Number of standalone shows: {len(standalone_shows)}")
print("Sample of standalone shows:")
standalone_shows


Number of standalone shows: 1112
Sample of standalone shows:


  standalone_mask = ~merged_engagement['Title'].str.contains(


Unnamed: 0,Title,Total Hours Viewed in 2023
4,Queen Charlotte: A Bridgerton Story,580600000
11,Kaleidoscope: Limited Series,274800000
14,Crash Course in Romance: Limited Series // 일타 ...,265900000
21,Doctor Cha: Limited Series // 닥터 차정숙: 리미티드 시리즈,253900000
31,Chiquititas (2013),279300000
...,...,...
5648,W. Kamau Bell: Private School Negro,200000
5649,Whitney Cummings: Can I Touch It?,200000
5650,Whitney Cummings: Jokes,200000
5651,"Whose Vote Counts, Explained: Limited Series",200000


In [16]:
#2. Extract season numbers for multi-season shows
#Multi-season shows are the ones that contain ": Season #" in their title.
#Example: "Wednesday: Season 1", "Stranger Things: Season 4".

#Create a mask for multi-season shows (opposite of standalone_mask)
multi_season_mask = ~standalone_mask
multi_season_shows = merged_engagement[multi_season_mask].copy()

#Extract the season number using Regex and store it in a new column
#Regex pattern: ": Season (\d+)" looks for "Season #" and extracts the number
multi_season_shows['Season'] = (
    multi_season_shows['Title']
    .str.extract(
        r": (?:Season|Series|Volume|Book|Part|Chapter|Temporada|Collection) (\d+)", 
        expand=False
    )  # Extracts the season number
    .fillna(0)
    .astype(int)  # Converts the extracted number to an integer for sorting
)

# Print intermediate results to verify correctness
print(f"Number of multi-season shows: {len(multi_season_shows)}")
print("Sample of multi-season shows with extracted seasons:")
multi_season_shows


Number of multi-season shows: 4542
Sample of multi-season shows with extracted seasons:


Unnamed: 0,Title,Total Hours Viewed in 2023,Season
0,The Night Agent: Season 1,967600000,1
1,Ginny & Georgia: Season 2,731300000,2
2,The Glory: Season 1 // 더 글로리: 시즌 1,689700000,1
3,Wednesday: Season 1,670400000,1
5,You: Season 4,471400000,4
...,...,...,...
5642,Two Santas: Season 1 // Zwei Weihnachtsmänner:...,900000,1
5643,Undercover Food Fighters: Season 1 // 위장취업: 시즌 1,1000000,1
5645,Vem Dançar com o Universo Z: Season 3,200000,3
5646,Vietnamese Horror Story: Season 1 // Chuyện ma...,200000,1


In [17]:
#3. Retain only the last season for multi-season shows
print("\nStep 2.3: Retaining only the last season for multi-season shows...")

#Create a "Show Name" column by removing ": Season #" and everything after it
#This ensures all rows for the same show are grouped correctly
multi_season_shows['Show_Name'] = multi_season_shows['Title'].str.replace(
    r": (Season|Series|Volume|Book|Part|Chapter|Temporada|Collection) \d+.*", 
    "", 
    regex=True
).str.strip()

#Normalize "Show_Name" for consistent grouping
#Convert to lowercase and remove special characters
multi_season_shows['Normalized_Show_Name'] = (
    multi_season_shows['Show_Name']
    .str.lower()
    .str.replace(r"[^\w\s]", "", regex=True) #Remove special characters
    .str.strip() #Remove leading/trailing whitespace
)

#Group by "Normalized_Show_Name" and find the row with the highest season
last_season_shows = (
    multi_season_shows.loc[multi_season_shows.groupby('Normalized_Show_Name')['Season'].idxmax()]
)

#Drop temporary columns used for grouping
#last_season_shows = last_season_shows.drop(columns=['Show_Name', 'Normalized_Show_Name'])

#Print intermediate results to verify correctness
# Sort by the highest season for clarity
print(f"Number of shows after retaining last season entries: {len(last_season_shows)}")
print("Sample of last season shows sorted by highest season (descending):")
print(last_season_shows.sort_values(by='Season', ascending=False)[['Title', 'Season', 'Total Hours Viewed in 2023']].head(10))



Step 2.3: Retaining only the last season for multi-season shows...
Number of shows after retaining last season entries: 2352
Sample of last season shows sorted by highest season (descending):
                                                  Title  Season  Total Hours Viewed in 2023
915               Survivor (2000): Season 32: Kaôh Rōng      32                    23100000
4690                         Top Gear (2003): Season 31      31                     1000000
2623                          The Real World: Season 28      28                     4400000
2318                      Thomas and Friends: Season 24      24                     8700000
2386                   Hell's Kitchen (2005): Season 21      21                     5600000
3124  Naruto Shippuden: Season 21 // NARUTO-ナルト- 疾風伝...      21                     4500000
5042                  Bob the Builder (1999): Season 21      21                      800000
2966                            Intervention: Season 21      21        

#### Finally, we Concatenate the two seperate datasets (`last_season_shows` and `standalone_shows`) we got in this step into a single dataframe (`final_engagement_report`):</b>

In [18]:
#Combine the two datasets using pd.concat:
final_engagement_report = pd.concat([standalone_shows, last_season_shows], ignore_index=True)

print(f"Total number of shows in the final engagement report: {len(final_engagement_report)}")
# Print a sample of the combined dataset
print("Sample of the final engagement report:")
print(final_engagement_report[['Title', 'Total Hours Viewed in 2023']].head(30))

Total number of shows in the final engagement report: 3464
Sample of the final engagement report:
                                                Title  Total Hours Viewed in 2023
0                 Queen Charlotte: A Bridgerton Story                   580600000
1                        Kaleidoscope: Limited Series                   274800000
2   Crash Course in Romance: Limited Series // 일타 ...                   265900000
3      Doctor Cha: Limited Series // 닥터 차정숙: 리미티드 시리즈                   253900000
4                                  Chiquititas (2013)                   279300000
..                                                ...                         ...
25                     Harry & Meghan: Limited Series                    72900000
26                               Maid: Limited Series                    98900000
27                           THE DAYS: Limited Series                    82700000
28  MADOFF: The Monster of Wall Street: Limited Se...                    66300000


## Step 3: Filter relevant TV shows from the cancelled Netflix TV shows dataset
In this step, we filter the canceled Netflix TV shows dataset to include only the shows relevant to our 2023 analysis. Using the `Year` column, we retain:
1. <b>Shows that only ran in 2023:</b> These are identified by `Year = "2023"`.
2. <b>Shows that ended in 2023 (i.e., cancelled):</b> (`Year` in the format `year - 2023`).

This ensures we exclude:
- Shows that started in 2023 but ended later (e.g., `2023 - 2025`).
- Any other entries outside these two scenarios.

In [19]:
#1. Define regex patterns for relevant scenarios

#Case 1: Shows that only ran in 2023:
single_year_pattern = r"^2023$" #Matches rows where Year column in exactly "2023" (shows that only ran in 2023)
#Case 2: Shows that ended in 2023:
ended_in_2023_pattern = r"^\d{4} - 2023$" #Matches rows where the Year column ends with - 2023 (shows that ended in 2023 after running for multiple years), and due to the nature of the dataset we know that ended == cancelled.

#2. Apply the filters to the cancelled Netflix TV shows dataset
#Retain only shows matching either case 1 or case 2:
filtered_cancelled_netflix_shows = cancelled_netflix_shows[
    cancelled_netflix_shows['Year'].str.match(single_year_pattern) | #Case 1
    cancelled_netflix_shows['Year'].str.match(ended_in_2023_pattern) #Case 2
]

#3. Drop the 'Genre' column since it's not needed
filtered_cancelled_shows = filtered_cancelled_netflix_shows.drop(columns=['Genre'])

#Print the cleaned DataFrame
print("Filtered Canceled Netflix Shows (Relevant to 2023):")
print(filtered_cancelled_shows)

#Print the total number of relevant canceled TV shows
print(f"\nTotal number of relevant canceled Netflix TV shows: {len(filtered_cancelled_shows)}")


Filtered Canceled Netflix Shows (Relevant to 2023):
                       Title         Year  Network
8   My Dad the Bounty Hunter         2023  Netflix
9       Ada Twist, Scientist  2021 - 2023  Netflix
10            Everything Now         2023  Netflix
11            Emergency: NYC         2023  Netflix
12                 Dance 100         2023  Netflix
..                       ...          ...      ...
54            Lockwood & Co.         2023  Netflix
55              Ridley Jones  2021 - 2023  Netflix
58    Bling Empire: New York         2023  Netflix
60                 Freeridge         2023  Netflix
61                  Sex/Life  2021 - 2023  Netflix

[23 rows x 3 columns]

Total number of relevant canceled Netflix TV shows: 23


In [20]:
#Normalize "Title" for matching purposes
#Convert to lowercase and remove special characters
filtered_cancelled_shows['Normalized_Show_Name'] = (
    filtered_cancelled_shows['Title']
    .str.lower()
    .str.replace(r"[^\w\s]", "", regex=True) #Remove special characters
    .str.strip() #Remove leading/trailing whitespace
)

In [23]:
#Merge the two reports (inner join to retain only overlapping titles)
merged_cancelled = pd.merge(
    filtered_cancelled_shows,
    final_engagement_report, 
    on='Normalized_Show_Name',  #Join on this column
)

In [None]:
non_cancelled_shows_hoursviewed = merged_cancelled.drop(columns=[
    'Year',
    'Network',
    'Normalized_Show_Name',
    'Title_y',
    'Season',
    'Show_Name'
])

non_cancelled_shows_hoursviewed = non_cancelled_shows_hoursviewed.rename(
    columns={'Title_y': 'Title', 'Total Hours Viewed in 2023': 'Hours Viewed'}
)

Observation: the following TV shows only appear in Jul-Dec (H2), not in Jan-Jun (H1):
1. Everything Now
2. Captain Fall
3. Obliterated

In [None]:
non_cancelled_shows_hoursviewed.to_csv('non_cancelled_shows_hoursviewed.csv')