In [None]:
PROJECT: Exploratory Data Analysis (EDA) on Titanic Dataset
üìÅ Step 1: Project Folder Structure
Create this folder structure:

Titanic_EDA_Project/
‚îÇ
‚îú‚îÄ‚îÄ data/
‚îÇ   ‚îî‚îÄ‚îÄ titanic.csv
‚îÇ
‚îú‚îÄ‚îÄ Titanic_EDA.ipynb
‚îú‚îÄ‚îÄ README.md
‚îî‚îÄ‚îÄ requirements.txt
üìä Step 2: Download Dataset
Download Titanic dataset from:

https://www.kaggle.com/datasets/yasserh/titanic-dataset

OR use this direct CSV link:

https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv

Save it as:

data/titanic.csv
üì¶ Step 3: Install Required Libraries
Run this in Jupyter:

!pip install pandas numpy matplotlib seaborn
üìì Step 4: Complete Jupyter Notebook Code (Copy Everything Below)
# ================================
# TITANIC DATASET - EDA PROJECT
# ================================

# 1. Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (10,6)

print("Libraries Imported Successfully")
# 2. Load Dataset

df = pd.read_csv("data/titanic.csv")

print("Dataset Loaded Successfully")
df.head()
# 3. Basic Information

print("Shape of Dataset:", df.shape)
print("\nColumn Names:\n", df.columns)
print("\nData Types:\n", df.dtypes)
print("\nMissing Values:\n", df.isnull().sum())
# 4. Statistical Summary

df.describe()
# 5. Data Cleaning

# Handling Missing Values

# Fill Age with median
df["Age"].fillna(df["Age"].median(), inplace=True)

# Fill Embarked with mode
df["Embarked"].fillna(df["Embarked"].mode()[0], inplace=True)

# Drop Cabin (too many missing values)
df.drop(columns=["Cabin"], inplace=True)

print("Missing Values After Cleaning:\n")
print(df.isnull().sum())
# 6. Survival Count

sns.countplot(x="Survived", data=df)
plt.title("Survival Count")
plt.show()
# 7. Survival by Gender

sns.countplot(x="Sex", hue="Survived", data=df)
plt.title("Survival by Gender")
plt.show()
# 8. Survival by Passenger Class

sns.countplot(x="Pclass", hue="Survived", data=df)
plt.title("Survival by Passenger Class")
plt.show()
# 9. Age Distribution

sns.histplot(df["Age"], kde=True, bins=30)
plt.title("Age Distribution")
plt.show()
# 10. Correlation Heatmap

# Convert categorical columns to numeric
df_encoded = df.copy()
df_encoded["Sex"] = df_encoded["Sex"].map({"male":0, "female":1})
df_encoded["Embarked"] = df_encoded["Embarked"].map({"S":0, "C":1, "Q":2})

plt.figure(figsize=(12,8))
sns.heatmap(df_encoded.corr(), annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()
# 11. Fare Distribution

sns.boxplot(x="Pclass", y="Fare", data=df)
plt.title("Fare Distribution by Class")
plt.show()
# 12. Final Insights

print("""
Key Insights:

1. Females had higher survival rate than males.
2. First-class passengers had better survival chances.
3. Younger passengers had slightly better survival.
4. Fare and Pclass strongly influenced survival.
""")
üßπ Data Cleaning Explained (You Can Add This In Markdown Cell)
Add this in a Markdown cell:

# Data Cleaning Steps

1. Filled missing Age values using median.
2. Filled missing Embarked values using mode.
3. Dropped Cabin column due to excessive missing values.
4. Converted categorical variables to numeric for correlation analysis.
üìÑ Step 5: Create README.md (Copy This)
# Titanic Dataset - Exploratory Data Analysis

## Project Overview
This project performs Exploratory Data Analysis (EDA) on the Titanic dataset.

## Objectives
- Data Cleaning
- Handling Missing Values
- Statistical Analysis
- Data Visualization
- Pattern Recognition

## Tools Used
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Jupyter Notebook

## Key Findings
- Females had higher survival rate.
- 1st class passengers survived more.
- Fare and passenger class affected survival.
- Age distribution shows majority between 20-40.

## Author
Your Name
üì¶ Step 6: requirements.txt (Copy This)
pandas
numpy
matplo‚Ä¶
import os

# Create the main project directory and the data subfolder
os.makedirs('Titanic_EDA_Project/data', exist_ok=True)
[9:21 AM, 2/28/2026] dhiiraj_napte: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set the visual style for our charts
sns.set_theme(style="whitegrid")

# Load the Titanic dataset
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)

# Display the first 5 rows
print("Dataset Loaded Successfully!")
df.head()
[9:21 AM, 2/28/2026] dhiiraj_napte: # Check for missing values
print("Missing values per column:")
print(df.isnull().sum())

# Handling Missing Values
# 1. Fill 'Age' with the median age
df['Age'] = df['Age'].fillna(df['Age'].median())

# 2. Fill 'Embarked' with the most common value (mode)
df['Embarked'] = df['Embarked'].fillna(df['Embarked'].mode()[0])

# 3. Drop 'Cabin' because it has too many missing values (over 70%)
df.drop(columns=['Cabin'], inplace=True)

# Confirm cleaning
print("\nMissing values after cleaning:")
print(df.isnull().sum())
[9:21 AM, 2/28/2026] dhiiraj_napte: # Summary statistics for numerical columns
print("Statistical Summary:")
display(df.describe())

# Check unique values for categorical columns
print("\nUnique values in 'Survived' (0 = No, 1 = Yes):")
print(df['Survived'].value_counts())
[9:22 AM, 2/28/2026] dhiiraj_napte: # Create a figure with multiple subplots
plt.figure(figsize=(15, 10))

# 1. Distribution of Age
plt.subplot(2, 2, 1)
sns.histplot(df['Age'], kde=True, color='skyblue')
plt.title('Age Distribution of Passengers')

# 2. Survival Rate by Gender
plt.subplot(2, 2, 2)
sns.barplot(x='Sex', y='Survived', data=df, palette='viridis')
plt.title('Survival Rate: Male vs Female')

# 3. Passenger Class vs Survival
plt.subplot(2, 2, 3)
sns.countplot(x='Pclass', hue='Survived', data=df, palette='Set2')
plt.title('Survival Count by Ticket Class')

# 4. Correlation Heatmap
plt.subplot(2, 2, 4)
# We only use numeric columns for the heatmap
numeric_df = df.select_dtypes(include=[np.number])
sns.heatmap(numeric_df.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap')

plt.tight_layout()
plt.show()
 üìä Project 01: Exploratory Data Analysis (EDA)
Uncovering Insights from Public Datasets
üìù Overview
This project focuses on performing a comprehensive Exploratory Data Analysis (EDA) on a publicly available dataset. The goal is to move beyond raw numbers and uncover the underlying patterns, anomalies, and relationships that drive the data.
üõ†Ô∏è Tech Stack
This project leverages the standard Python data science ecosystem:
Language: Python 3.x
Environment: Jupyter Notebook
Libraries:
Pandas: Data manipulation and cleaning.
NumPy: Numerical computations.
Matplotlib & Seaborn: Advanced data visualization.
üöÄ The Workflow
1. Data Selection & Loading
I have selected the [Insert Dataset Name, e.g., Titanic / Iris / House Prices] dataset from [Kaggle/Public Source]
üöÄ PROJECT: Exploratory Data Analysis (EDA) on Titanic Dataset
üìÅ Step 1: Project Folder Structure
Create this folder structure:
Step 2: Download Dataset
Download Titanic dataset from:

https://www.kaggle.com/datasets/yasserh/titanic-dataset

OR use this direct CSV link:
Step 3: Install Required Libraries
Run this in Jupyter:

!pip install pandas numpy matplotlib seaborn
üìì Step 4: Complete Jupyter Notebook Code (Copy Everything Below)
5. Data Cleaning

# Handling Missing Values

# Fill Age with median
df["Age"].fillna(df["Age"].median(), inplace=True)

# Fill Embarked with mode
df["Embarked"].fillna(df["Embarked"].mode()[0], inplace=True)
[9:36 AM, 2/28/2026] dhiiraj_napte: 7. Survival by Gender

sns.countplot(x="Sex", hue="Survived", data=df)
plt.title("Survival by Gender")
plt.show()
# 8. Survival by Passenger Class

sns.countplot(x="Pclass", hue="Survived", data=df)
plt.title("Survival by Passenger Class")
plt.show()
Titanic Dataset - Exploratory Data Analysis

## Project Overview
This project performs Exploratory Data Analysis (EDA) on the Titanic dataset.

## Objectives
- Data Cleaning
- Handling Missing Values
- Statistical Analysis
- Data Visualization
- Pattern Recognition

## Tools Used
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Jupyter Notebook

## Key Findings
- Females had higher survival rate.
- 1st class passengers survived more.
- Fare and passenger class affected survival.
- Age distribution shows majority between 20-40.

## Author
RUTUJA ARVIND BHENDE