---
title: "Interactive Session 4C"
subtitle: "🔄 Quick Tour: 9-Step Data Science Workflow"
jupyter: eds217_2025
format: 
    html:
        toc: true
        toc-depth: 3
        code-fold: show
---




## Quick Tour: The Data Science Workflow

**Every data science project follows the same systematic approach.** Today we'll take a **quick tour** through all 9 steps using simple examples. This gives you the big picture before we dive deeper in coming days!




```{mermaid}
flowchart LR
    A["1. Import<br/>📂"] --> B["2. Explore<br/>🔍"] --> C["3. Clean<br/>🧹"]
    C --> D["4. Filter<br/>🎯"] --> E["5. Sort<br/>📊"]
    E --> F["6. Transform<br/>🔄"] --> G["7. Group<br/>👥"]
    G --> H["8. Aggregate<br/>📈"] --> I["9. Visualize<br/>📊"]

    style A fill:#e1f5fe
    style B fill:#e8f5e8
    style C fill:#fff3e0
    style D fill:#f3e5f5
    style E fill:#e0f2f1
    style F fill:#fce4ec
    style G fill:#e8eaf6
    style H fill:#f1f8e9
    style I fill:#fff8e1
```




:::{.callout-important title="Session Goals"}
**Today**: Quick overview of all 9 steps with simple examples  
**Days 5-7**: Deep dive into specific steps with real data  
**End-of-day**: Practice the complete workflow yourself!
:::

## Getting Started

Create a new notebook called `Session_4C_Workflow_Tour.ipynb` and **type along** as we tour the data science workflow!

## Setup


In [None]:
#| echo: true

import pandas as pd
import matplotlib.pyplot as plt

## Workflow Tour: 9 Simple Steps

Follow along and **type each step**. We'll use simple, short commands that are easy to type!

## 📂 Step 1: Import 
**Key Function**: `pd.read_csv()`


In [None]:
#| echo: true

# Import data
df = pd.read_csv('../cheatsheets/ocean_temperatures.csv')
print("Data imported")

## 🔍 Step 2: Explore
**Key Function**: `df.head()`


In [None]:
#| echo: true

# Explore data
df.head()

## 🧹 Step 3: Clean
**Key Function**: `df.dropna()`


In [None]:
#| echo: true

# Clean data
df_clean = df.dropna()
df_clean.shape

## 🎯 Step 4: Filter
**Key Function**: Boolean indexing `df[df['column'] == 'value']`


In [None]:
#| echo: true

# Filter data
filtered = df_clean[df_clean['ocean'] == 'Pacific']
filtered.head()

## 📊 Step 5: Sort
**Key Function**: `df.sort_values()`


In [None]:
#| echo: true

# Sort data
sorted_df = df_clean.sort_values('temperature', ascending=False)
sorted_df.head()

## 🔄 Step 6: Transform
**Key Function**: Create new columns


In [None]:
#| echo: true

# Transform data
df_clean['temp_f'] = df_clean['temperature'] * 9/5 + 32
df_clean[['temperature', 'temp_f']].head()

## 👥 Step 7: Group
**Key Function**: `df.groupby()`


In [None]:
#| echo: true

# Group data
by_ocean = df_clean.groupby('ocean')
by_ocean.size()

## 📈 Step 8: Aggregate
**Key Function**: `.mean()`, `.sum()`, `.count()`


In [None]:
#| echo: true

# Aggregate data
avg_temps = by_ocean['temperature'].mean()
avg_temps

## 📊 Step 9: Visualize
**Key Function**: `.plot()`


In [None]:
#| echo: true

# Visualize data
avg_temps.plot(kind='bar')
plt.title('Average Ocean Temperatures')
plt.show()

## Summary

You just learned the **9-step data science workflow**:

1. **Import**: `pd.read_csv()`
2. **Explore**: `df.head()`
3. **Clean**: `df.dropna()`
4. **Filter**: `df[df['column'] == 'value']`
5. **Sort**: `df.sort_values()`
6. **Transform**: Create new columns
7. **Group**: `df.groupby()`
8. **Aggregate**: `.mean()`, `.sum()`, `.count()`
9. **Visualize**: `.plot()`

## What's Next?

- **Day 5**: Practice filtering and cleaning in detail
- **Day 6**: Master grouping and aggregation
- **Day 7**: Create beautiful visualizations
- **End-of-day today**: Apply this workflow yourself!

::: {.center-text .body-text-xl .teal-text}
End interactive session 4C
:::