# 📊 Session 2 — Student Worksheet (Pandas + Plotly)
Work in small groups. Find a CSV dataset on GitHub, load it in Colab, and adapt commands from the workbook.
Be ready to walk through your code at the end.


## Part 1 — Find a Dataset (GitHub)

In [None]:
#csv_url = "https://tradingeconomics.com/matrix"
#csv_url

'https://github.com/phant0mZY/Indian-Macroeconomic-Indicators-Dataset/blob/main/finance.csv'

In [None]:
#data = 'https://drive.google.com/drive/my-drive'

## Part 2 — Load & Inspect

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
data = "/content/drive/MyDrive/macro_indicators.csv"

In [5]:
import pandas as pd

# Read CSV
df = pd.read_csv(data)
print(df.head())


         Country    GDP  GDP Growth  Interest Rate  Inflation Rate  \
0  United States  29185         3.8           4.25             2.9   
1          China  18744         1.1           3.00            -0.4   
2      Euro Area  16406         0.1           2.15             2.0   
3        Germany   4660        -0.3           2.15             2.2   
4          Japan   4026         0.5           0.50             2.7   

   Jobless Rate  Gov. Budget  Debt/GDP  Current Account  Population  
0           4.3         -6.4     124.3             -3.9      341.15  
1           5.3         -6.5      88.3              2.2     1408.00  
2           6.2         -3.1      87.4              2.6      351.38  
3           6.3         -2.8      62.5              5.7       83.58  
4           2.3         -2.3     236.7              4.7      120.65  


## Part 3 — Explore & Clean

In [6]:
df_15 = df.sort_values('GDP', ascending=False).head(15)
df_15

Unnamed: 0,Country,GDP,GDP Growth,Interest Rate,Inflation Rate,Jobless Rate,Gov. Budget,Debt/GDP,Current Account,Population
0,United States,29185,3.8,4.25,2.9,4.3,-6.4,124.3,-3.9,341.15
1,China,18744,1.1,3.0,-0.4,5.3,-6.5,88.3,2.2,1408.0
2,Euro Area,16406,0.1,2.15,2.0,6.2,-3.1,87.4,2.6,351.38
3,Germany,4660,-0.3,2.15,2.2,6.3,-2.8,62.5,5.7,83.58
4,Japan,4026,0.5,0.5,2.7,2.3,-2.3,236.7,4.7,120.65
5,India,3913,1.7,5.5,2.07,5.1,-4.8,81.92,-0.6,1398.6
6,United Kingdom,3644,0.3,4.0,3.8,4.7,-4.8,95.9,-2.7,69.23
7,France,3162,0.3,2.15,0.9,7.5,-5.8,113.0,0.4,68.44
8,Italy,2373,-0.1,2.15,1.6,6.0,-3.4,135.3,1.1,58.93
9,Canada,2241,-0.4,2.5,1.9,7.1,-2.1,110.8,-1.0,41.53


## Part 4 — Summarization

In [None]:
category_col = 'REPLACE_ME'
numeric_col = 'REPLACE_ME'

by_cat = df.groupby(category_col)[numeric_col].sum().sort_values(ascending=False)
by_cat.head(10)

KeyError: 'REPLACE_ME'

## Part 5 — Visualization (Pandas + Plotly)

In [7]:
import plotly.express as px

bar_df_15 = df_15.reset_index()
fig = px.bar(bar_df_15, x='Country', y='GDP', title=f'Top 15 {'GDP'} (interactive)')
fig.show()

In [27]:
import plotly.express as px

scatter_df = df.reset_index()
scatter_df['Size'] = scatter_df['Country'].apply(lambda x: 40 if x == 'Brazil' else 3)
fig = px.scatter(scatter_df,
                 x='Interest Rate',
                 y='Inflation Rate',
                 title = 'Countries',
                 color = 'Country',
                 size = 'Size')

fig.update_xaxes(range=[0, 20])   # x-axis from 0 to 20
fig.update_yaxes(range=[0, 20])  # y-axis from -5 to 15

fig.show()

In [30]:
scatter_df = df.reset_index().sort_values('Debt/GDP', ascending = False)
scatter_df['Size'] = scatter_df['Country'].apply(lambda x: 40 if x == 'Brazil' else 3)
fig = px.scatter(scatter_df,
                 x='Interest Rate',
                 y='Debt/GDP',
                 title = 'Countries',
                 color = 'Country',
                 size = 'Size')

fig.update_xaxes(range=[0, 20])   # x-axis from 0 to 20
#fig.update_yaxes(range=[0, 20])  # y-axis from -5 to 15

fig.show()

In [21]:
import plotly.express as px

brics = ['Brazil', 'Russia', 'India', 'China', 'South Africa']

scatter_df = df.reset_index().sort_values('Debt/GDP', ascending=False)
scatter_df['Group'] = scatter_df['Country'].apply(lambda x: x if x in brics else 'Other')

# Make Brazil larger
scatter_df['Size'] = scatter_df['Country'].apply(lambda x: 200 if x == 'Brazil' else 30)

fig = px.scatter(
    scatter_df,
    x='Interest Rate',
    y='Debt/GDP',
    title='Countries',
    color='Group',
    size='Size',
    color_discrete_map={
        'Brazil': 'green',
        'Russia': 'red',
        'India': 'orange',
        'China': 'blue',
        'South Africa': 'purple',
        'Other': 'lightgray'
    },
    hover_name='Country'
)

fig.update_xaxes(range=[0, 20])
fig.show()

In [None]:
import plotly.express as px

bar_df = by_cat.head(10).reset_index()
fig = px.bar(bar_df, x=category_col, y=numeric_col, title=f'Top 10 {category_col} (interactive)')
fig.show()

NameError: name 'by_cat' is not defined

## Part 6 — Reflection
- What needed adjusting when you switched datasets?
- What was easier/harder vs the workbook?
- What context/limitations should a policymaker know?


## Part 7 — ⚡ Lightning Chart Demos — Teach-Out Instructions

### What is a Lightning Demo?
A **rapid-fire mini-presentation**: 2 minutes per group. Show **one chart** + **one insight**, not your whole notebook.

### Demo Format (2 minutes)
1. **Context (15 sec):** Dataset name + 1-sentence purpose.  
2. **Chart + Insight (75–90 sec):** Show interactive Plotly chart; state one clear finding; name one wrangling step.  
3. **Caveat or Next Step (15–30 sec):** One limitation OR next idea.  
4. **Pass the mic (10 sec):** End with *“Any questions about our chart?”*

### Unique Angles
- Aggregation choice (sum vs mean, raw vs normalized).  
- Data cleaning step (conversion, deduping).  
- Encoding (bar vs line vs scatter).  
- Interactivity (facet, hover, filter).  
- Comparability (per capita vs raw counts).  
- Temporal nuance (trend vs snapshot).  

👉 If another group showed your angle, pick a different one.

### Flow for the Day
- Groups present **clustered by chart type** (bars, lines, scatter, other).  
- Instructor summarizes similarities/differences after each cluster.  
- 15 groups × 2 min = ~30 minutes + transitions = ~40 minutes.

### Presenter Checklist
- [ ] Dataset URL loads in Colab  
- [ ] Columns chosen make sense  
- [ ] One **interactive Plotly figure** with title + labels  
- [ ] Insight stated in one sentence  
- [ ] One cleaning step mentioned  

### Audience Task
After each cluster, jot 1 sentence:  
- What was **similar** across charts?  
- What was **different** (aggregation, encoding, interactivity)?  
- Which chart best supported a **policy decision**—and why?

---
### Quick Rubric (3 pts)
- **Clarity of claim (1 pt)**  
- **Method fit (1 pt)**  
- **Insight/caveat (1 pt)**  
(Bonus +0.5 for meaningful interactivity)

---
### Minimal Code Patterns

**Top-N bar:**
```python
bar_df = df.groupby(category_col)[numeric_col].sum().nlargest(10).reset_index()
fig = px.bar(bar_df, x=category_col, y=numeric_col,
             title=f"Top 10 {category_col} by {numeric_col}")
fig.show()
```

**Time series:**
```python
ts = (df.groupby(time_col)[numeric_col]
        .sum()
        .reset_index()
        .sort_values(time_col))
fig = px.line(ts, x=time_col, y=numeric_col, markers=True,
              title=f"{numeric_col} over time")
fig.show()
```

**Scatter:**
```python
fig = px.scatter(df, x=x_col, y=y_col, trendline="ols",
                 title=f"{y_col} vs {x_col}")
fig.show()
```

---