# Task 02 â€” Titanic Exploratory Data Analysis (Offline Dataset)

**Goal:** Perform basic EDA (missing values, distributions, relationships) and create clear visuals.

This notebook uses a local CSV (`data/titanic_synthetic.csv`) so it runs **without internet**.
> Note: This is a *synthetic Titanic-like dataset* created for learning purposes.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

DATA_PATH = "../../data/titanic_synthetic.csv"
df = pd.read_csv(DATA_PATH)

df.head()

In [None]:
df.info()

## 1) Missing values

In [None]:
df.isna().mean().sort_values(ascending=False)

In [None]:
# Simple imputation for analysis (median age)
df["age_filled"] = df["age"].fillna(df["age"].median())
df[["age","age_filled"]].head()

## 2) Survival rate by sex

In [None]:
survival_by_sex = df.groupby("sex")["survived"].mean().sort_values(ascending=False)
survival_by_sex

In [None]:
plt.figure(figsize=(6,4))
plt.bar(survival_by_sex.index, survival_by_sex.values)
plt.title("Survival Rate by Sex")
plt.xlabel("Sex")
plt.ylabel("Survival rate")
plt.ylim(0,1)
plt.tight_layout()
plt.show()

## 3) Survival rate by passenger class

In [None]:
survival_by_class = df.groupby("pclass")["survived"].mean().sort_index()
survival_by_class

In [None]:
plt.figure(figsize=(6,4))
plt.bar(survival_by_class.index.astype(str), survival_by_class.values)
plt.title("Survival Rate by Passenger Class")
plt.xlabel("Pclass")
plt.ylabel("Survival rate")
plt.ylim(0,1)
plt.tight_layout()
plt.show()

## 4) Fare distribution (log scale helps)

In [None]:
plt.figure(figsize=(7,4))
plt.hist(df["fare"], bins=30)
plt.title("Fare Distribution")
plt.xlabel("Fare")
plt.ylabel("Count")
plt.tight_layout()
plt.show()

plt.figure(figsize=(7,4))
plt.hist(df["fare"], bins=30, log=True)
plt.title("Fare Distribution (log count)")
plt.xlabel("Fare")
plt.ylabel("Count (log)")
plt.tight_layout()
plt.show()

## Summary
- Checked missing values and handled missing age for analysis.
- Compared survival across **sex** and **passenger class**.
- Visualized fare distribution to understand skew.
