# Titanic EDA (Sample)
This notebook performs Exploratory Data Analysis (EDA) on the provided `titanic_dataset_sample.csv`. It includes data summary, missing value checks, univariate and bivariate visualizations, and a summary of insights.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')

# Load dataset
df = pd.read_csv('titanic_dataset_sample.csv')
df.head()

In [None]:
# Basic info
df.info()
df.describe(include='all')

In [None]:
# Missing values
df.isnull().sum()

# Univariate: distributions
plt.figure(figsize=(8,4))
df['Age'].hist(bins=10)
plt.title('Age distribution')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()

In [None]:
# Bivariate: Survived vs Pclass
sns.countplot(data=df, x='Pclass', hue='Survived')
plt.title('Survival by Pclass')
plt.show()

In [None]:
# Correlation heatmap (numeric cols)
num_cols = df.select_dtypes(include=np.number).columns.tolist()
plt.figure(figsize=(6,4))
sns.heatmap(df[num_cols].corr(), annot=True, fmt='.2f')
plt.title('Correlation heatmap')
plt.show()

## Observations
- Check the outputs of the above cells to see counts, missing values, and relationships.
- `Pclass` appears associated with survival in many Titanic analyses: higher class often had higher survival rates.
- `Age` has missing values in this sample; consider imputation strategies for real dataset.
