# Day 8: Exploratory Data Analysis & Data Storytelling - Starter Notebook

Welcome to Day 8! This notebook guides you through EDA and data storytelling.

## Learning Objectives
- Understand the EDA process and its importance
- Use Python tools to explore and summarize data
- Visualize data for insight and communication
- Tell a compelling story with data

## Instructions
Complete each exercise section below. Refer to `docs/day_8_eda_storytelling.md` for detailed guidance.

---
## Setup
Run the cell below to import required libraries.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Display settings
%matplotlib inline
sns.set_theme(style="whitegrid")
pd.set_option('display.max_columns', None)

print("Libraries imported successfully!")

---
## Exercise 1: EDA on a Real Dataset

**Deliverables:**
1. Perform EDA on a provided dataset (summary stats, missing values, distributions).

**Success Criteria:**
- Key characteristics of the data are identified
- Issues (missing values, outliers) are noted

**Dataset:** Use the Titanic dataset at `../data/titanic.csv`

In [None]:
# TODO: Load the dataset
df = None  # Replace with your code

In [None]:
# TODO: Get basic info about the dataset
# Hint: Use df.shape, df.dtypes, df.info()


In [None]:
# TODO: Summary statistics
# Hint: Use df.describe()


In [None]:
# TODO: Check for missing values
# Hint: Use df.isnull().sum()


In [None]:
# TODO: Examine distributions
# Hint: Use df.hist() or individual column histograms


---
## Exercise 2: Data Visualization

**Deliverables:**
1. Create at least three different types of plots to explore relationships in the data.

**Success Criteria:**
- Plots are clear and informative
- Visualizations reveal insights

In [None]:
# TODO: Create a histogram or distribution plot
# Hint: Use sns.histplot() or plt.hist()


In [None]:
# TODO: Create a boxplot to identify outliers
# Hint: Use sns.boxplot()


In [None]:
# TODO: Create a scatter plot or pairplot
# Hint: Use sns.scatterplot() or sns.pairplot()


In [None]:
# TODO: Create a correlation heatmap
# Hint: Use sns.heatmap(df.corr(), annot=True)


---
## Exercise 3: Data Storytelling

**Deliverables:**
1. Present a short narrative using your EDA and visualizations.

**Success Criteria:**
- Story is clear and supported by data
- Visuals enhance the narrative

### Your Data Story

*Write your narrative here. Use markdown to structure your story with headers, bullet points, and embedded visualizations.*

**Key Questions to Answer:**
1. What is the main question or insight you're exploring?
2. What did the data reveal?
3. What are the key takeaways?

---

TODO: Write your data story below


In [None]:
# TODO: Create a final visualization that supports your story


---
## Validation Checklist

Before proceeding to the next day, verify:
- [ ] EDA performed on a real dataset
- [ ] Multiple visualizations created (at least 3 different types)
- [ ] Data story presented clearly with narrative