# üìà EDA & Visualization

**Author**: Data Science Master System  
**Difficulty**: ‚≠ê Beginner  
**Time**: 30 minutes  
**Prerequisites**: 02_basic_data_analysis

## Learning Objectives
- Create effective visualizations
- Understand data distributions
- Find correlations and patterns
- Tell stories with data

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
%matplotlib inline

## 1. Load Sample Data

In [None]:
np.random.seed(42)
df = pd.DataFrame({
    'age': np.random.randint(18, 70, 200),
    'income': np.random.normal(50000, 15000, 200),
    'spending': np.random.normal(500, 150, 200),
    'category': np.random.choice(['A', 'B', 'C'], 200)
})
df['income'] = df['income'].clip(lower=20000)
df.head()

## 2. Distribution Plots

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Histogram
axes[0].hist(df['income'], bins=20, edgecolor='white')
axes[0].set_title('Income Distribution')
axes[0].set_xlabel('Income')

# KDE
sns.kdeplot(data=df, x='age', ax=axes[1], fill=True)
axes[1].set_title('Age Density')

# Box plot
sns.boxplot(data=df, x='category', y='spending', ax=axes[2])
axes[2].set_title('Spending by Category')

plt.tight_layout()
plt.show()

## 3. Relationship Plots

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Scatter plot
axes[0].scatter(df['income'], df['spending'], alpha=0.5, c=df['age'], cmap='viridis')
axes[0].set_xlabel('Income')
axes[0].set_ylabel('Spending')
axes[0].set_title('Income vs Spending')

# Correlation heatmap
corr = df[['age', 'income', 'spending']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', ax=axes[1])
axes[1].set_title('Correlation Matrix')

plt.tight_layout()
plt.show()

## 4. Categorical Analysis

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Bar chart
df['category'].value_counts().plot(kind='bar', ax=axes[0])
axes[0].set_title('Category Counts')

# Pie chart
df['category'].value_counts().plot(kind='pie', autopct='%1.1f%%', ax=axes[1])
axes[1].set_title('Category Distribution')

plt.tight_layout()
plt.show()

## üéØ Key Takeaways
1. Histograms for distribution
2. Scatter plots for relationships
3. Heatmaps for correlations
4. Box plots for outliers

**Next**: 04_first_ml_model.ipynb