<a href="https://colab.research.google.com/github/junyanvv/junyanvv/blob/main/Session_2_Student_Worksheet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📊 Session 2 — Student Worksheet (Pandas + Plotly)
Work in small groups. Find a CSV dataset on GitHub, load it in Colab, and adapt commands from the workbook.
Be ready to walk through your code at the end.


## Part 1 — Find a Dataset (GitHub)

In [7]:
csv_url = '/content/drive/MyDrive/Python/FastFoodNutritionMenuV2.csv'
csv_url

'/content/drive/MyDrive/Python/FastFoodNutritionMenuV2.csv'

## Part 2 — Load & Inspect

In [8]:
import pandas as pd

df = pd.read_csv(csv_url)
rows, cols = df.shape
rows, cols

(1148, 14)

## Part 3 — Explore & Clean

In [9]:
df.dtypes

df.isnull().sum().sort_values(ascending=False).head(10)

Unnamed: 0,0
Calories from\nFat,506
Weight Watchers\nPnts,261
Saturated Fat\n(g),57
Fiber\n(g),57
Trans Fat\n(g),57
Total Fat\n(g),57
Protein\n(g),57
Carbs\n(g),57
Calories,1
Cholesterol\n(mg),1


In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Part 4 — Summarization

In [15]:
df['Calories_clean'] = (df['Calories'].astype(str)
                        .str.replace(',', '')
                        .str.replace(' ', '')
                        .str.replace('cal', '', case=False)
                        .str.replace('kcal', '', case=False))

df['Calories_clean'] = pd.to_numeric(df['Calories_clean'], errors='coerce')

In [16]:
category_col = 'Company'
numeric_col = 'Calories'

by_cat_mean = df.groupby(category_col)['Calories_clean'].mean().sort_values(ascending=False)
by_cat_mean.head(10)

Unnamed: 0_level_0,Calories_clean
Company,Unnamed: 1_level_1
Burger King,359.189944
Wendy’s,322.5
Taco Bell,292.166667
McDonald’s,284.618902
Pizza Hut,253.378378
KFC,215.229358


## Part 5 — Visualization (Pandas + Plotly)

In [22]:
import plotly.express as px

bar_df = by_cat_mean.head(10).reset_index()
fig = px.bar(bar_df, x='Company', y='Calories_clean', title=f'Top 6 Companies by Average Calories')
fig.show()

## Part 6 — Reflection
- What needed adjusting when you switched datasets?
- What was easier/harder vs the workbook?
- What context/limitations should a policymaker know?


## Part 7 — ⚡ Lightning Chart Demos — Teach-Out Instructions

### What is a Lightning Demo?
A **rapid-fire mini-presentation**: 2 minutes per group. Show **one chart** + **one insight**, not your whole notebook.

### Demo Format (2 minutes)
1. **Context (15 sec):** Dataset name + 1-sentence purpose.  
2. **Chart + Insight (75–90 sec):** Show interactive Plotly chart; state one clear finding; name one wrangling step.  
3. **Caveat or Next Step (15–30 sec):** One limitation OR next idea.  
4. **Pass the mic (10 sec):** End with *“Any questions about our chart?”*

### Unique Angles
- Aggregation choice (sum vs mean, raw vs normalized).  
- Data cleaning step (conversion, deduping).  
- Encoding (bar vs line vs scatter).  
- Interactivity (facet, hover, filter).  
- Comparability (per capita vs raw counts).  
- Temporal nuance (trend vs snapshot).  

👉 If another group showed your angle, pick a different one.

### Flow for the Day
- Groups present **clustered by chart type** (bars, lines, scatter, other).  
- Instructor summarizes similarities/differences after each cluster.  
- 15 groups × 2 min = ~30 minutes + transitions = ~40 minutes.

### Presenter Checklist
- [ ] Dataset URL loads in Colab  
- [ ] Columns chosen make sense  
- [ ] One **interactive Plotly figure** with title + labels  
- [ ] Insight stated in one sentence  
- [ ] One cleaning step mentioned  

### Audience Task
After each cluster, jot 1 sentence:  
- What was **similar** across charts?  
- What was **different** (aggregation, encoding, interactivity)?  
- Which chart best supported a **policy decision**—and why?

---
### Quick Rubric (3 pts)
- **Clarity of claim (1 pt)**  
- **Method fit (1 pt)**  
- **Insight/caveat (1 pt)**  
(Bonus +0.5 for meaningful interactivity)

---
### Minimal Code Patterns

**Top-N bar:**
```python
bar_df = df.groupby(category_col)[numeric_col].sum().nlargest(10).reset_index()
fig = px.bar(bar_df, x=category_col, y=numeric_col,
             title=f"Top 10 {category_col} by {numeric_col}")
fig.show()
```

**Time series:**
```python
ts = (df.groupby(time_col)[numeric_col]
        .sum()
        .reset_index()
        .sort_values(time_col))
fig = px.line(ts, x=time_col, y=numeric_col, markers=True,
              title=f"{numeric_col} over time")
fig.show()
```

**Scatter:**
```python
fig = px.scatter(df, x=x_col, y=y_col, trendline="ols",
                 title=f"{y_col} vs {x_col}")
fig.show()
```

---