# Day 2: Core Chart Types Practice

## 1. Objective
- Practice making and annotating common charts using matplotlib/seaborn

## 2. Key Steps
- Load Titanic dataset
- Create bar chart, histogram, and line chart
- Annotate each with markdown and apply design principles


## Bar Chart: Survival Distribution
**Takeaway:** Most passengers did not survive the Titanic disaster.

**Design Notes:**
- Use color to highlight survivor category
- Directly label bars; avoid chartjunk


In [None]:
# Bar chart: Survival count (improved version)
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('titanic')
survival_counts = df['survived'].value_counts()
labels = ['Did Not Survive', 'Survived']
counts = [survival_counts[0], survival_counts[1]]

fig, ax = plt.subplots()
bars = ax.bar(labels, counts, color=['grey', 'green'])

# Add data labels on top of bars
for bar in bars:
    height = bar.get_height()
    ax.annotate(f'{height}', xy=(bar.get_x() + bar.get_width() / 2, height),
                xytext=(0, 5), textcoords="offset points",
                ha='center', va='bottom', fontsize=10)

# Improved title and formatting
ax.set_title('Over 60% of Passengers Did Not Survive', fontsize=12)
ax.set_ylabel('Passenger Count')
ax.set_xticks(range(len(labels)))
ax.set_xticklabels(labels)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.grid(axis='y', linestyle='--', alpha=0.3)

plt.tight_layout()
plt.show()

## Bar Chart: Survival Distribution  
### *"Over 60% of Passengers Did Not Survive"*

### What Works Well

- **Insightful Title:** Leads with the takeaway, not just a variable label. This primes the audience for the key message—one of Cole Nussbaumer Knaflic’s top principles.
- **Minimal Clutter:** No distracting chartjunk. Light gridlines and removed borders keep attention on the data.
- **Effective Use of Color:** Grey vs. green provides intuitive contrast. The green highlights the positive class (“Survived”), drawing focus.
- **Direct Labeling:** Numeric values above the bars reduce reliance on the y-axis and make comparison effortless.

### Recommendations

- **Narrative Context:** Consider adding a caption beneath the chart explaining why this distribution matters.
- **Bar Order:** Placing “Survived” first might soften the emotional impact or suit a hopeful framing, depending on your story.
- **Visual Polish:** Slightly increasing the title font or making it bold can enhance readability.

### Design Principle Alignment (from *Storytelling with Data*)

| Principle                    | Notes                                           |
|------------------------------|--------------------------------------------------|
| Understand the context       | Title sets up the insight clearly               |
| Choose the right display     | Bar chart suits categorical comparison          |
| Eliminate clutter            | Simple visual with clear focus                  |
| Focus attention              | Color and layout guide the viewer's eye         |
| Think like a designer        | Clean font, spacing, and emphasis               |
| Tell a story                 | Viewer immediately grasps the key takeaway      |

## Histogram: Age Distribution
**Takeaway:** The age distribution is right-skewed, with a concentration in the 20–40 range.

**Design Notes:**
- Use consistent color scheme
- Clear axis labels and chart title


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('titanic')
ages = df['age'].dropna()
median_age = ages.median()

# Plot
plt.figure(figsize=(8, 5))
plt.hist(ages, bins=20, color='steelblue', edgecolor='black')

# Add vertical line for median
plt.axvline(median_age, color='darkred', linestyle='--', linewidth=2, label=f'Median Age: {median_age:.1f}')

# Titles and labels
plt.title('Most Titanic Passengers Were Between 20–40 Years Old', fontsize=12)
plt.xlabel('Age')
plt.ylabel('Passenger Count')

# Clean layout and annotation
plt.legend()
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

## Histogram: Age Distribution  
### *"Most Titanic Passengers Were Between 20–40 Years Old"*

### What Works Well

- **Insightful Title:** The chart leads with the message, not just the metric. This immediately guides the audience to what matters: the core age range.
- **Median Line as Anchor:** The dashed vertical line at age 28 provides a visual benchmark, helping viewers interpret the distribution relative to a familiar reference.
- **Clean, Uncluttered Layout:** Gridlines are subtle, borders are clean, and no unnecessary elements distract from the story.
- **Legend and Labeling:** The median is clearly labeled and color-coded, with an appropriate legend. Axis labels are simple and effective.

### 🛠 Recommendations

- **Optional Enhancement:** You could add a shaded box or light annotation to visually group the 20–40 range, reinforcing the claim made in the title.
- **Clarify Implication (in markdown):** Consider pairing this chart with a short explanation of why age mattered for survival, if known.

### 🧭 Design Principle Alignment (from *Storytelling with Data*)

| Principle                    | Notes                                              |
|------------------------------|----------------------------------------------------|
| Understand the context       | Chart directly supports demographic analysis       |
| Choose the right display     | Histogram is ideal for continuous variable (age)   |
| Eliminate clutter            | Design is streamlined with only relevant features  |
| Focus attention              | Median line draws the eye and provides context     |
| Think like a designer        | Thoughtful use of spacing, color, and hierarchy    |
| Tell a story                 | Viewer leaves knowing the main distribution shape  |

## Line Chart: Simulated Embarkation Trend
**Note:** Since Titanic data doesn't include timestamps, this is a placeholder example for line chart practice.

**Design Notes:**
- Demonstrates trend over time
- Use marker and color to highlight curve


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

df = sns.load_dataset('titanic')
sim_counts = df['embarked'].value_counts().sort_index()
sim_counts.index = ['Cherbourg', 'Queenstown', 'Southampton']

# Plot line chart
plt.figure(figsize=(7, 4))
plt.plot(sim_counts.index, sim_counts.values, marker='o', color='orange', linewidth=2)

# Updated title to reflect insight
plt.title('Most Passengers Embarked from Southampton', fontsize=12)

# Label axes
plt.xlabel('Port of Embarkation')
plt.ylabel('Passenger Count')

# Tidy up visual design
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

## Line Chart: Port of Embarkation  
### *"Most Passengers Embarked from Southampton"*

### What Works Well

- **Insightful Title:** Rather than simply naming the variable (“Embarkation Trend”), the title tells the viewer the most important takeaway: Southampton had the highest passenger count.
- **Minimal Clutter:** Gridlines are soft and subtle. Axes are labeled clearly and simply. The orange line and markers stand out without being overwhelming.
- **Design Simplicity:** A single color, no legend needed, and tight layout makes the chart digestible at a glance.

### Suggestions (for learning purposes)

- **Chart Type Note:** While line charts are ideal for trends over time or continuous variables, this data is categorical. A bar chart would typically be preferred to avoid implying continuity.
- **Optional Additions:** You could annotate the highest point (Southampton) or show a dotted baseline for comparison, though it’s not necessary for this case.

### Design Principle Alignment (from *Storytelling with Data*)

| Principle                    | Notes                                               |
|------------------------------|-----------------------------------------------------|
| Understand the context       | Chart makes clear this is about boarding locations  |
| Choose the right display     | Line chart used for practice; bar chart would fit better |
| Eliminate clutter            | No distractions; clean layout                      |
| Focus attention              | Emphasis on Southampton via peak and color         |
| Think like a designer        | Good use of whitespace and consistent styling       |
| Tell a story                 | Title leads with the main finding                  |

## 4. Summary

### Key Accomplishments
- Practiced creating and annotating three foundational chart types using the Titanic dataset:
  - Bar chart for survival distribution
  - Histogram for passenger ages
  - Line chart for embarkation count (simulated for trend visualization)

### Design & Storytelling Principles Applied
| Chart Type     | Title (Insight-Based)                                 | Key Design Features |
|----------------|--------------------------------------------------------|---------------------|
| Bar Chart      | *Over 60% of Passengers Did Not Survive*              | Highlight color, direct labels, takeaway title |
| Histogram      | *Most Titanic Passengers Were Between 20–40 Years Old*| Median line, right-skew recognition, clean layout |
| Line Chart     | *Most Passengers Embarked from Southampton*           | Peak emphasis, minimal clutter, insight-led headline |

### Reflections
- Titles matter: leading with the takeaway improves comprehension and engagement.
- Anchors (like the median line) help guide interpretation without additional explanation.
- Simplicity is powerful: removing clutter makes insights easier to spot.
- Even when a line chart isn’t the “ideal” display, it can still be used thoughtfully for layout and trend practice.

### Next Steps
These storytelling techniques will now be applied to visualizing real ML results from Block 5 in the upcoming days.
