# ðŸŽ€ Analysis of Female-Led ("Girly") TV Shows on Netflix

This notebook focuses on identifying and analyzing female-led TV shows on Netflix
using descriptive data analysis techniques.

## 1. Dataset Structure

In this step, we examine the overall structure of the dataset,
including its size, columns, and data types.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use("default")

In [4]:
df = pd.read_csv("../data/raw/netflix_titles.csv")

## 1. Dataset Size

Before starting the analysis, we check the number of rows and columns
to understand the scale of the dataset.

In [5]:
df.shape

(8807, 12)

## 2. Column Overview

In this step, we list all columns in the dataset
to understand what information is available for analysis.

In [6]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description'],
      dtype='object')

## 3. Data Types and Missing Values

In this section, we inspect data types and identify missing values
to assess data quality before deeper analysis.

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


## 4. Missing Values Overview

This step shows the exact number of missing values in each column,
allowing us to identify problematic fields.

In [8]:
df.isnull().sum().sort_values(ascending=False)

director        2634
country          831
cast             825
date_added        10
rating             4
duration           3
show_id            0
type               0
title              0
release_year       0
listed_in          0
description        0
dtype: int64

## 5. Handling Missing Cast Information

Since identifying female-led TV shows requires cast information,
titles with missing cast data are excluded from further analysis.

This is a conscious analytical decision to ensure accuracy,
rather than making assumptions or imputations.

In [9]:
df_clean= df.dropna(subset=["cast"])

In [10]:
df_clean.shape

(7982, 12)

## 6. Filtering Only TV Shows

To keep the analysis focused and consistent,
we limit the dataset to TV Shows only.
Movies are excluded since character-centric analysis
is more meaningful for series.

In [11]:
df_tv = df_clean[df_clean["type"] == "TV Show"]

In [12]:
df_tv.shape

(2326, 12)

## 7. Defining Female-Led TV Shows

Since the dataset does not explicitly indicate the gender of main characters,
we define a TV show as *female-led* if the first listed cast member
has a female first name.

This heuristic provides a simple and transparent approximation
suitable for an exploratory analysis.

In [13]:
df_tv = df_clean[df_clean["type"] == "TV Show"].copy()

In [14]:
df_tv["main_actor"] = df_tv["cast"].str.split(",").str[0]

In [15]:
df_tv[["cast", "main_actor"]].head()

Unnamed: 0,cast,main_actor
1,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",Ama Qamata
2,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",Sami Bouajila
4,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",Mayur More
5,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",Kate Siegel
8,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",Mel Giedroyc


## 8. Identifying Female-Led TV Shows

To classify TV shows as female-led, we compare the first name
of the main actor against a predefined list of common female names.

This rule-based approach provides a transparent and interpretable
method suitable for a junior-level exploratory analysis.

In [16]:
female_names = [
    "Jessica", "Emily", "Sarah", "Emma", "Olivia", "Sophia", "Anna",
    "Millie", "Elizabeth", "Jennifer", "Rachel", "Claire", "Lucy",
    "Lily", "Grace", "Natalie", "Victoria", "Zoe", "Amy", "Kate"
]

In [17]:
df_tv["first_name"] = df_tv["main_actor"].str.split().str[0]

In [18]:
df_tv[["main_actor", "first_name"]].head()

Unnamed: 0,main_actor,first_name
1,Ama Qamata,Ama
2,Sami Bouajila,Sami
4,Mayur More,Mayur
5,Kate Siegel,Kate
8,Mel Giedroyc,Mel


In [19]:
df_tv["female_led"] = df_tv["first_name"].isin(female_names)

In [20]:
df_tv["female_led"].value_counts()

female_led
False    2251
True       75
Name: count, dtype: int64

## 9. Defining "Girly" TV Shows Based on Genres ðŸ’…

Since "girly" is not an official Netflix genre, we define girly TV shows
as those that fall into genres commonly associated with female-centered
storytelling, such as romance, teen drama, and character-driven narratives.

This definition is used as a thematic approximation for exploratory analysis.

In [21]:
girly_genres = [
    "Romantic TV Shows",
    "Teen TV Shows",
    "TV Dramas",
    "TV Comedies",
    "Reality TV"
]

In [22]:
df_tv["is_girly"] = df_tv["listed_in"].apply(
    lambda x: any(genre in x for genre in girly_genres)
)

In [23]:
df_tv["is_girly"].value_counts()

is_girly
True     1509
False     817
Name: count, dtype: int64

In [24]:
girly_female_led = df_tv[
    (df_tv["female_led"] == True) &
    (df_tv["is_girly"] == True)
]

In [25]:
girly_female_led.shape

(48, 16)

In [26]:
girly_female_led.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,main_actor,first_name,female_led,is_girly
5,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,"September 24, 2021",2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...,Kate Siegel,Kate,True,True
222,s223,TV Show,Clickbait,Brad Anderson,"Zoe Kazan, Betty Gabriel, Adrian Grenier, Phoe...",,"August 25, 2021",2021,TV-MA,1 Season,"Crime TV Shows, TV Dramas, TV Mysteries",When family man Nick Brewer is abducted in a c...,Zoe Kazan,Zoe,True,True
477,s478,TV Show,Atypical,,"Jennifer Jason Leigh, Keir Gilchrist, Michael ...",United States,"July 9, 2021",2021,TV-14,4 Seasons,"TV Comedies, TV Dramas, Teen TV Shows",When a teen on the autism spectrum decides to ...,Jennifer Jason Leigh,Jennifer,True,True
638,s639,TV Show,Sex/Life,,"Sarah Shahi, Mike Vogel, Adam Demos, Margaret ...",United States,"June 25, 2021",2021,TV-MA,1 Season,"Romantic TV Shows, TV Dramas",A woman's daring sexual past collides with her...,Sarah Shahi,Sarah,True,True
749,s750,TV Show,L.A.â€™s Finest,,"Jessica Alba, Gabrielle Union",United States,"June 9, 2021",2021,TV-MA,2 Seasons,"Crime TV Shows, TV Action & Adventure, TV Come...","In this spinoff of the ""Bad Boys"" franchise, t...",Jessica Alba,Jessica,True,True
