# ðŸŽ€ Analysis of Female-Led ("Girly") TV Shows on Netflix

This notebook focuses on identifying and analyzing female-led TV shows on Netflix
using descriptive data analysis techniques.

## 1. Dataset Structure

In this step, we examine the overall structure of the dataset,
including its size, columns, and data types.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use("default")

In [3]:
df = pd.read_csv("../data/raw/netflix_titles.csv")

## 1. Dataset Size

Before starting the analysis, we check the number of rows and columns
to understand the scale of the dataset.

In [4]:
df.shape

(8807, 12)

## 2. Column Overview

In this step, we list all columns in the dataset
to understand what information is available for analysis.

In [5]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description'],
      dtype='object')

## 3. Data Types and Missing Values

In this section, we inspect data types and identify missing values
to assess data quality before deeper analysis.

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


## 4. Missing Values Overview

This step shows the exact number of missing values in each column,
allowing us to identify problematic fields.

In [7]:
df.isnull().sum().sort_values(ascending=False)

director        2634
country          831
cast             825
date_added        10
rating             4
duration           3
show_id            0
type               0
title              0
release_year       0
listed_in          0
description        0
dtype: int64

## 5. Handling Missing Cast Information

Since identifying female-led TV shows requires cast information,
titles with missing cast data are excluded from further analysis.

This is a conscious analytical decision to ensure accuracy,
rather than making assumptions or imputations.

In [8]:
df_clean= df.dropna(subset=["cast"])

In [9]:
df_clean.shape

(7982, 12)