[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jkitchin/s26-06642/blob/main/dsmles/participation/participation-05-dimensionality-reduction.ipynb)

# Module 05: Dimensionality Reduction - Participation Exercises

## Exercise 5.1: Prediction - Variance Explained

**Type:** 🔮 Prediction (3 min)

You have a dataset with 10 features describing chemical compounds. You apply PCA.

**Predict:** How much variance do you think PC1 will explain?

- [ ] < 20% (features are independent)
- [ ] 20-40% (some correlation)
- [ ] 40-60% (moderate correlation)
- [ ] > 60% (highly correlated features)

**Consider:** What does your answer imply about the "true" dimensionality of the data?

*Your prediction and reasoning:*



## Exercise 5.2: Mini-Exercise - Interpret PCA Loadings

**Type:** 🔧 Mini-Exercise (7 min)

Analyze PCA loadings to understand what each component represents.

In [None]:
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Polymer property data (synthetic)
np.random.seed(42)
n = 100

# Create correlated features
strength = np.random.normal(50, 10, n)
hardness = strength * 0.8 + np.random.normal(0, 5, n)  # Correlated with strength
flexibility = 100 - strength + np.random.normal(0, 8, n)  # Anti-correlated
density = np.random.normal(1.2, 0.2, n)  # Independent
cost = np.random.normal(10, 3, n)  # Independent

df = pd.DataFrame({
    'strength': strength,
    'hardness': hardness,
    'flexibility': flexibility,
    'density': density,
    'cost': cost
})

# Apply PCA
X_scaled = StandardScaler().fit_transform(df)
pca = PCA()
pca.fit(X_scaled)

# Look at loadings
loadings = pd.DataFrame(
    pca.components_.T,
    columns=[f'PC{i+1}' for i in range(5)],
    index=df.columns
)
print("PCA Loadings:")
print(loadings.round(3))

print("\nVariance explained:", pca.explained_variance_ratio_.round(3))

# TASK: Interpret what PC1 and PC2 represent based on the loadings
# What physical meaning can you assign to each component?

*Your interpretation:*

- PC1 represents:
- PC2 represents:

## Exercise 5.3: Discussion - PCA vs t-SNE

**Type:** 💬 Discussion (5 min)

You need to visualize a high-dimensional dataset. When would you choose:

1. **PCA** over t-SNE?
2. **t-SNE** over PCA?
3. **Both** (for different purposes)?

Consider: interpretability, reproducibility, global vs local structure, computational cost.

*Discussion notes:*

