Module 6: Perform Custom EDA (Titanic) 


Author: Jaystin Garcia

Date: February 16, 2026

Purpose: In this project, I will be conducting my own exploratory data analysis on Titanic data. This project explores the Titanic passenger dataset, which contains demographic information, ticket class, fares, family relationships, and survival outcomes for passengers aboard the Titanic

Section 1: Imports and Configuration

In [3]:
# Imports at the top of the file
# REQ.EXTERNAL.DEPS: External packages must be defined in pyproject.toml
# REQ.EXTERNAL.DEPS.INSTALLED: external packages must be installed in the environment using uv sync command
# REQ.EXTERNAL.DEPS.IMPORTED: external packages used in this notebook must be imported here


from matplotlib.axes import Axes
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Type hint for Axes object (basic plot type returned by Seaborn)
# A seaborn plot is a set of axes and you can set the title, labels, etc. on the axes.

# A figure can contain multiple axes (plots)
# from matplotlib.figure import Figure

# Pandas display configuration (helps in notebooks)
pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 120)

print("Imports complete.")

Imports complete.


Section 2: Load the Data

In [4]:
# Python cell

# Load the titanic dataset from Seaborn
# Into a pandas DataFrame (2D table)
titanic_df: pd.DataFrame = sns.load_dataset("titanic")

# Preview the first few rows
titanic_df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


Section 3: Inspect Data and Structure

In [5]:
# Section 3 Python cell

# Get shape - number of rows and columns
shape: tuple[int, int] = titanic_df.shape

# Communicate the shape clearly
print(f"The titanic dataset has {shape[0]} rows and {shape[1]} columns.")

# Display column names and data types
titanic_df.info()

# List the column names
print("Column names:")
print(list(titanic_df.columns))

The titanic dataset has 891 rows and 15 columns.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   survived     891 non-null    int64   
 1   pclass       891 non-null    int64   
 2   sex          891 non-null    object  
 3   age          714 non-null    float64 
 4   sibsp        891 non-null    int64   
 5   parch        891 non-null    int64   
 6   fare         891 non-null    float64 
 7   embarked     889 non-null    object  
 8   class        891 non-null    category
 9   who          891 non-null    object  
 10  adult_male   891 non-null    bool    
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object  
 13  alive        891 non-null    object  
 14  alone        891 non-null    bool    
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB
Column names:
['survived', 'pc

Section 4: Create a dictionary and check data quality

In [6]:
# Count missing values in each column
print("Missing values per column:")
print(titanic_df.isnull().sum())

# Check for duplicate rows
num_duplicates = titanic_df.duplicated().sum()
print(f"Number of duplicate rows: {num_duplicates}")

Missing values per column:
survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64
Number of duplicate rows: 107
