# Tutorial: Optimization with SciPy and Dimensionality Reduction with PCA
## DA5401W - Foundations of Machine Learning
### Instructor: Dr. Arun B Ayyar
### IIT Madras

---

## Instructions

This tutorial contains **hands-on problems** for:
1. **Optimization using SciPy** (Linear Programming, Nonlinear Optimization, Constrained Optimization)
2. **Dimensionality Reduction using PCA** (Feature extraction, visualization, reconstruction)

**How to use this notebook:**
- Each problem has a **Problem Statement** with data setup
- Try solving the problem yourself first
- Solutions are hidden - click **"Show Solution"** to reveal
- Run all cells to load the data and helper functions

**Learning Objectives:**
- Apply scipy.optimize for real-world optimization problems
- Understand when to use different optimization methods
- Implement PCA for dimensionality reduction
- Interpret principal components and explained variance
- Apply PCA to real datasets

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.optimize import minimize, linprog, LinearConstraint, NonlinearConstraint
from scipy import stats
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer, load_wine, fetch_olivetti_faces

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

# Set random seed
np.random.seed(42)

print("‚úì All libraries imported successfully!")
print("‚úì Ready to start the tutorial")

---
# Part 1: Optimization with SciPy

## Problem 1: Production Planning (Linear Programming)

### Problem Statement

A factory produces two products: **Product A** and **Product B**.

**Profit:**
- Product A: ‚Çπ50 per unit
- Product B: ‚Çπ40 per unit

**Resource Constraints:**
- **Labor hours**: Product A needs 2 hours, Product B needs 1 hour. Total available: 100 hours
- **Raw material**: Product A needs 1 kg, Product B needs 2 kg. Total available: 80 kg
- **Machine time**: Product A needs 1 hour, Product B needs 1 hour. Total available: 60 hours

**Question:** How many units of each product should be produced to maximize profit?

**Mathematical Formulation:**

Maximize: $Z = 50x_1 + 40x_2$

Subject to:
- $2x_1 + x_2 \leq 100$ (labor)
- $x_1 + 2x_2 \leq 80$ (raw material)
- $x_1 + x_2 \leq 60$ (machine time)
- $x_1, x_2 \geq 0$

In [None]:
# Data for Problem 1
# Objective function coefficients (we need to minimize, so negate for maximization)
c = [-50, -40]  # Negative because linprog minimizes

# Inequality constraint matrix (A_ub @ x <= b_ub)
A_ub = np.array([
    [2, 1],   # Labor constraint
    [1, 2],   # Raw material constraint
    [1, 1]    # Machine time constraint
])

b_ub = np.array([100, 80, 60])

# Bounds for variables (x1 >= 0, x2 >= 0)
bounds = [(0, None), (0, None)]

print("Problem 1 Data Loaded")
print("="*60)
print("Objective: Maximize 50*x1 + 40*x2")
print("\nConstraints:")
print("  2*x1 + 1*x2 <= 100  (Labor)")
print("  1*x1 + 2*x2 <= 80   (Raw material)")
print("  1*x1 + 1*x2 <= 60   (Machine time)")
print("  x1, x2 >= 0")
print("\n‚Üí YOUR TASK: Use scipy.optimize.linprog to solve this problem")

In [1]:
# YOUR SOLUTION HERE
# Use: result = linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=bounds, method='highs')



<details>
<summary><b>üìñ Show Solution</b></summary>

```python
# Solution for Problem 1
from scipy.optimize import linprog

# Solve the linear program
result = linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=bounds, method='highs')

print("Solution for Problem 1: Production Planning")
print("="*60)
print(f"Status: {result.message}")
print(f"\nOptimal Production:")
print(f"  Product A: {result.x[0]:.2f} units")
print(f"  Product B: {result.x[1]:.2f} units")
print(f"\nMaximum Profit: ‚Çπ{-result.fun:.2f}")  # Negative because we minimized

# Check resource utilization
print(f"\nResource Utilization:")
labor_used = 2*result.x[0] + result.x[1]
material_used = result.x[0] + 2*result.x[1]
machine_used = result.x[0] + result.x[1]
print(f"  Labor: {labor_used:.2f} / 100 hours ({labor_used/100*100:.1f}%)")
print(f"  Raw Material: {material_used:.2f} / 80 kg ({material_used/80*100:.1f}%)")
print(f"  Machine Time: {machine_used:.2f} / 60 hours ({machine_used/60*100:.1f}%)")

# Visualize the feasible region
x1 = np.linspace(0, 60, 400)
x2_labor = 100 - 2*x1
x2_material = (80 - x1) / 2
x2_machine = 60 - x1

plt.figure(figsize=(10, 8))
plt.plot(x1, x2_labor, 'r-', label='Labor: 2x‚ÇÅ + x‚ÇÇ ‚â§ 100', linewidth=2)
plt.plot(x1, x2_material, 'b-', label='Material: x‚ÇÅ + 2x‚ÇÇ ‚â§ 80', linewidth=2)
plt.plot(x1, x2_machine, 'g-', label='Machine: x‚ÇÅ + x‚ÇÇ ‚â§ 60', linewidth=2)

# Fill feasible region
x1_fill = np.linspace(0, 60, 400)
x2_upper = np.minimum(np.minimum(100 - 2*x1_fill, (80 - x1_fill)/2), 60 - x1_fill)
x2_upper = np.maximum(x2_upper, 0)
plt.fill_between(x1_fill, 0, x2_upper, alpha=0.2, color='yellow', label='Feasible Region')

# Plot optimal point
plt.plot(result.x[0], result.x[1], 'r*', markersize=20, label=f'Optimal: ({result.x[0]:.1f}, {result.x[1]:.1f})')

plt.xlim(0, 60)
plt.ylim(0, 60)
plt.xlabel('Product A (x‚ÇÅ)', fontsize=12)
plt.ylabel('Product B (x‚ÇÇ)', fontsize=12)
plt.title('Linear Programming: Production Planning', fontsize=14, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
```

**Expected Answer:**
- Product A: 40 units
- Product B: 20 units
- Maximum Profit: ‚Çπ2800

</details>

---
## Problem 2: Portfolio Optimization (Quadratic Programming)

### Problem Statement

You want to invest in 3 stocks. Historical data shows:

**Expected Returns:**
- Stock 1: 12% per year
- Stock 2: 10% per year
- Stock 3: 8% per year

**Covariance Matrix** (risk):
```
        Stock1  Stock2  Stock3
Stock1   0.04    0.01    0.00
Stock2   0.01    0.03    0.01
Stock3   0.00    0.01    0.02
```

**Question:** Find the portfolio weights that **minimize risk** (variance) while achieving **at least 10% expected return**.

**Mathematical Formulation:**

Minimize: $\frac{1}{2}w^T \Sigma w$ (portfolio variance)

Subject to:
- $\mu^T w \geq 0.10$ (minimum return)
- $\sum w_i = 1$ (fully invested)
- $w_i \geq 0$ (no short selling)

In [None]:
# Data for Problem 2
# Expected returns
mu = np.array([0.12, 0.10, 0.08])

# Covariance matrix
Sigma = np.array([
    [0.04, 0.01, 0.00],
    [0.01, 0.03, 0.01],
    [0.00, 0.01, 0.02]
])

print("Problem 2 Data Loaded")
print("="*60)
print("Expected Returns:", mu)
print("\nCovariance Matrix:")
print(Sigma)
print("\n‚Üí YOUR TASK: Find optimal portfolio weights")
print("   - Minimize portfolio variance")
print("   - Expected return >= 10%")
print("   - Weights sum to 1")
print("   - No short selling (w >= 0)")

In [None]:
# YOUR SOLUTION HERE
# Hint: Define objective function as: lambda w: 0.5 * w @ Sigma @ w
# Use LinearConstraint and NonlinearConstraint for constraints



<details>
<summary><b>üìñ Show Solution</b></summary>

```python
# Solution for Problem 2
from scipy.optimize import minimize, LinearConstraint

# Objective: minimize portfolio variance
def portfolio_variance(w):
    return 0.5 * w.T @ Sigma @ w

# Gradient of objective
def portfolio_variance_grad(w):
    return Sigma @ w

# Constraints
# 1. Sum of weights = 1
constraint_sum = LinearConstraint(np.ones(3), 1, 1)

# 2. Expected return >= 0.10
constraint_return = LinearConstraint(mu, 0.10, np.inf)

# Bounds: no short selling
bounds = [(0, 1) for _ in range(3)]

# Initial guess
w0 = np.array([1/3, 1/3, 1/3])

# Solve
result = minimize(portfolio_variance, w0, method='SLSQP', jac=portfolio_variance_grad,
                  constraints=[constraint_sum, constraint_return], bounds=bounds)

print("Solution for Problem 2: Portfolio Optimization")
print("="*60)
print(f"Status: {result.message}")
print(f"\nOptimal Weights:")
for i, w in enumerate(result.x, 1):
    print(f"  Stock {i}: {w*100:.2f}%")

expected_return = mu @ result.x
portfolio_risk = np.sqrt(result.x @ Sigma @ result.x)

print(f"\nPortfolio Metrics:")
print(f"  Expected Return: {expected_return*100:.2f}%")
print(f"  Portfolio Risk (Std Dev): {portfolio_risk*100:.2f}%")
print(f"  Sharpe Ratio (assuming rf=2%): {(expected_return-0.02)/portfolio_risk:.3f}")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Pie chart of weights
axes[0].pie(result.x, labels=[f'Stock {i+1}' for i in range(3)], autopct='%1.1f%%',
            colors=['#ff9999','#66b3ff','#99ff99'], startangle=90)
axes[0].set_title('Optimal Portfolio Allocation', fontsize=14, fontweight='bold')

# Bar chart
stocks = ['Stock 1', 'Stock 2', 'Stock 3']
x_pos = np.arange(len(stocks))
axes[1].bar(x_pos, result.x*100, color=['#ff9999','#66b3ff','#99ff99'], edgecolor='black')
axes[1].set_xlabel('Stock', fontsize=12)
axes[1].set_ylabel('Weight (%)', fontsize=12)
axes[1].set_title('Portfolio Weights', fontsize=14, fontweight='bold')
axes[1].set_xticks(x_pos)
axes[1].set_xticklabels(stocks)
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()
```

**Expected Answer:**
- Approximately: Stock 1: 50%, Stock 2: 50%, Stock 3: 0%
- Expected Return: 11%
- Portfolio Risk: ~17%

</details>

---
## Problem 3: Rosenbrock Function Minimization

### Problem Statement

The **Rosenbrock function** is a famous test function for optimization algorithms:

$$f(x, y) = (1-x)^2 + 100(y-x^2)^2$$

It has a narrow parabolic valley with the global minimum at $(1, 1)$ where $f(1, 1) = 0$.

**Question:** 
1. Minimize the Rosenbrock function starting from $(-1.5, 2.5)$
2. Compare different optimization methods: `'BFGS'`, `'Nelder-Mead'`, `'CG'`
3. Which method converges fastest?

In [None]:
# Data for Problem 3
def rosenbrock(x):
    """Rosenbrock function"""
    return (1 - x[0])**2 + 100*(x[1] - x[0]**2)**2

def rosenbrock_grad(x):
    """Gradient of Rosenbrock function"""
    dfdx = -2*(1 - x[0]) - 400*x[0]*(x[1] - x[0]**2)
    dfdy = 200*(x[1] - x[0]**2)
    return np.array([dfdx, dfdy])

# Starting point
x0 = np.array([-1.5, 2.5])

print("Problem 3 Data Loaded")
print("="*60)
print("Function: f(x,y) = (1-x)¬≤ + 100(y-x¬≤)¬≤")
print(f"Starting point: {x0}")
print(f"Function value at start: {rosenbrock(x0):.2f}")
print("\n‚Üí YOUR TASK: Minimize using different methods and compare")

In [None]:
# YOUR SOLUTION HERE
# Try methods: 'BFGS', 'Nelder-Mead', 'CG'



<details>
<summary><b>üìñ Show Solution</b></summary>

```python
# Solution for Problem 3
methods = ['BFGS', 'Nelder-Mead', 'CG']
results = {}

print("Solution for Problem 3: Rosenbrock Function")
print("="*60)

for method in methods:
    if method in ['BFGS', 'CG']:
        result = minimize(rosenbrock, x0, method=method, jac=rosenbrock_grad)
    else:
        result = minimize(rosenbrock, x0, method=method)
    
    results[method] = result
    
    print(f"\nMethod: {method}")
    print(f"  Solution: x = {result.x[0]:.6f}, y = {result.x[1]:.6f}")
    print(f"  Function value: {result.fun:.10f}")
    print(f"  Iterations: {result.nit}")
    print(f"  Function evaluations: {result.nfev}")
    print(f"  Success: {result.success}")

# Visualize the function and optimization paths
x = np.linspace(-2, 2, 400)
y = np.linspace(-1, 3, 400)
X, Y = np.meshgrid(x, y)
Z = (1 - X)**2 + 100*(Y - X**2)**2

plt.figure(figsize=(14, 10))

# Contour plot
levels = np.logspace(-1, 3.5, 20)
contour = plt.contour(X, Y, Z, levels=levels, cmap='viridis', alpha=0.6)
plt.colorbar(contour, label='f(x, y)')

# Plot starting point
plt.plot(x0[0], x0[1], 'ko', markersize=12, label='Start', zorder=5)

# Plot optimal point
plt.plot(1, 1, 'r*', markersize=20, label='Global Minimum (1,1)', zorder=5)

# Plot solutions
colors = ['blue', 'green', 'orange']
for (method, result), color in zip(results.items(), colors):
    plt.plot(result.x[0], result.x[1], 'o', color=color, markersize=10, 
             label=f'{method}: ({result.x[0]:.3f}, {result.x[1]:.3f})', zorder=5)

plt.xlabel('x', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Rosenbrock Function: Optimization Methods Comparison', fontsize=14, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Comparison table
comparison_data = []
for method, result in results.items():
    comparison_data.append({
        'Method': method,
        'Final x': f'{result.x[0]:.6f}',
        'Final y': f'{result.x[1]:.6f}',
        'f(x,y)': f'{result.fun:.2e}',
        'Iterations': result.nit,
        'Func Evals': result.nfev
    })

df_comparison = pd.DataFrame(comparison_data)
print("\n" + "="*60)
print("Comparison Summary:")
print(df_comparison.to_string(index=False))
```

**Expected Answer:**
- All methods should converge to approximately (1.0, 1.0)
- BFGS typically converges fastest (fewest iterations)
- Nelder-Mead requires more function evaluations
- CG is in between

</details>

---
## Problem 4: Breast Cancer Dataset - Feature Extraction

### Problem Statement

The **Breast Cancer Wisconsin dataset** contains 569 samples of breast cancer tumors with 30 features computed from digitized images:
- Features include: radius, texture, perimeter, area, smoothness, compactness, concavity, etc.
- Each feature is computed as mean, standard error, and "worst" (mean of 3 largest values)
- Examples: mean radius, radius error, worst radius

There are 2 classes: **Malignant** (cancerous) and **Benign** (non-cancerous).

**Questions:**
1. Apply PCA to reduce from 30D to 2D
2. How much variance is explained by the first 2 principal components?
3. Visualize the data in the 2D PCA space
4. Which original features contribute most to PC1 and PC2?
5. Are the two classes (malignant vs benign) separable in 2D?

In [None]:
# Data for Problem 4
cancer = load_breast_cancer()
X_cancer = cancer.data
y_cancer = cancer.target
feature_names_cancer = cancer.feature_names
target_names_cancer = cancer.target_names

print("Problem 4 Data Loaded: Breast Cancer Dataset")
print("="*60)
print(f"Number of samples: {X_cancer.shape[0]}")
print(f"Number of features: {X_cancer.shape[1]}")
print(f"\nFirst 5 feature names:")
for i, name in enumerate(feature_names_cancer[:5]):
    print(f"  {i+1}. {name}")
print(f"  ... and {len(feature_names_cancer)-5} more features")
print(f"\nTarget names: {target_names_cancer}")
print(f"  - Malignant: {np.sum(y_cancer == 0)} samples")
print(f"  - Benign: {np.sum(y_cancer == 1)} samples")
print(f"\nFirst 5 samples (first 5 features):")
print(pd.DataFrame(X_cancer[:5, :5], columns=feature_names_cancer[:5]))
print("\n‚Üí YOUR TASK: Apply PCA and answer the questions above")

In [None]:
# YOUR SOLUTION HERE
# Steps:
# 1. Standardize the data using StandardScaler
# 2. Apply PCA with n_components=2
# 3. Transform the data
# 4. Analyze explained variance
# 5. Visualize



<details>
<summary><b>üìñ Show Solution</b></summary>

```python
# Solution for Problem 4
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Step 1: Standardize
scaler = StandardScaler()
X_cancer_scaled = scaler.fit_transform(X_cancer)

# Step 2: Apply PCA
pca = PCA(n_components=2)
X_cancer_pca = pca.fit_transform(X_cancer_scaled)

print("Solution for Problem 4: Breast Cancer Dataset PCA")
print("="*60)

# Question 2: Explained variance
print(f"\nExplained Variance Ratio:")
print(f"  PC1: {pca.explained_variance_ratio_[0]*100:.2f}%")
print(f"  PC2: {pca.explained_variance_ratio_[1]*100:.2f}%")
print(f"  Total: {sum(pca.explained_variance_ratio_)*100:.2f}%")

# Question 4: Component loadings (top 5 for each PC)
print(f"\nTop 5 Features Contributing to Each PC:")
pc1_top_idx = np.argsort(np.abs(pca.components_[0]))[-5:][::-1]
pc2_top_idx = np.argsort(np.abs(pca.components_[1]))[-5:][::-1]

print(f"\nPC1:")
for idx in pc1_top_idx:
    print(f"  {feature_names_cancer[idx]}: {pca.components_[0, idx]:.3f}")

print(f"\nPC2:")
for idx in pc2_top_idx:
    print(f"  {feature_names_cancer[idx]}: {pca.components_[1, idx]:.3f}")

# Question 3 & 5: Visualize
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Scatter plot in PCA space
colors = ['red', 'blue']
for i, (color, target_name) in enumerate(zip(colors, target_names_cancer)):
    mask = y_cancer == i
    axes[0].scatter(X_cancer_pca[mask, 0], X_cancer_pca[mask, 1], 
                    color=color, label=target_name, alpha=0.6, edgecolor='black', s=60)

axes[0].set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]*100:.1f}% variance)', fontsize=12)
axes[0].set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]*100:.1f}% variance)', fontsize=12)
axes[0].set_title('Breast Cancer Dataset in PCA Space', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# Biplot (showing top features only for clarity)
for i, (color, target_name) in enumerate(zip(colors, target_names_cancer)):
    mask = y_cancer == i
    axes[1].scatter(X_cancer_pca[mask, 0], X_cancer_pca[mask, 1], 
                    color=color, label=target_name, alpha=0.3, s=40)

# Add loading vectors for top 5 features
scale = 8
top_features_idx = list(set(pc1_top_idx[:3].tolist() + pc2_top_idx[:3].tolist()))
for idx in top_features_idx:
    axes[1].arrow(0, 0, pca.components_[0, idx]*scale, pca.components_[1, idx]*scale,
                  head_width=0.3, head_length=0.3, fc='darkgreen', ec='darkgreen', linewidth=2)
    # Shorten feature names
    short_name = feature_names_cancer[idx].replace('mean ', '').replace('worst ', 'w_')
    axes[1].text(pca.components_[0, idx]*scale*1.15, pca.components_[1, idx]*scale*1.15,
                 short_name, fontsize=9, ha='center', 
                 bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.7))

axes[1].set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]*100:.1f}% variance)', fontsize=12)
axes[1].set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]*100:.1f}% variance)', fontsize=12)
axes[1].set_title('PCA Biplot: Data + Top Feature Loadings', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Answer to Question 5
print("\nQuestion 5: Are malignant and benign tumors separable in 2D?")
print("  ‚Üí Yes! There is good separation between the two classes.")
print("  ‚Üí Malignant tumors tend to have higher PC1 values.")
print("  ‚Üí Some overlap exists, but the classes are largely distinguishable.")
print("  ‚Üí This suggests PCA captures important discriminative features.")
```

**Expected Answer:**
- PC1 explains ~44% variance, PC2 explains ~19%
- Total: ~63% variance retained
- PC1 is dominated by features like mean radius, mean perimeter, mean area, worst radius
- PC2 is influenced by texture and smoothness features
- Malignant and benign tumors show good separation with some overlap
- Larger tumors (higher PC1) tend to be malignant

</details>

---
## Problem 5: Olivetti Faces - Image Reconstruction

### Problem Statement

The **Olivetti Faces dataset** contains 400 grayscale face images of 40 different people (10 images per person). Each image is 64√ó64 pixels = 4096 features.

**Questions:**
1. Apply PCA and create a **scree plot** to determine how many components to keep
2. How many components are needed to retain 90% of variance?
3. Reduce to that many components and reconstruct the images
4. Visualize original vs reconstructed images
5. What is the compression ratio achieved?

In [None]:
# Data for Problem 5
faces = fetch_olivetti_faces()
X_faces = faces.data
y_faces = faces.target

print("Problem 5 Data Loaded: Olivetti Faces")
print("="*60)
print(f"Number of samples: {X_faces.shape[0]}")
print(f"Number of features (pixels): {X_faces.shape[1]}")
print(f"Image shape: 64x64")
print(f"Number of people: {len(np.unique(y_faces))}")

# Show some example images
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
    ax.imshow(X_faces[i].reshape(64, 64), cmap='gray')
    ax.set_title(f'Person {y_faces[i]}')
    ax.axis('off')
plt.suptitle('Sample Face Images', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n‚Üí YOUR TASK: Apply PCA for image compression and reconstruction")

In [None]:
# YOUR SOLUTION HERE
# Steps:
# 1. Standardize the data
# 2. Apply PCA with all components first
# 3. Create scree plot
# 4. Find number of components for 90% variance
# 5. Reconstruct images



<details>
<summary><b>üìñ Show Solution</b></summary>

```python
# Solution for Problem 5

# Step 1: Data is already normalized (0-1 range for images)
# For faces, we can work with the data directly or standardize
# Let's standardize for better PCA performance
scaler_faces = StandardScaler()
X_faces_scaled = scaler_faces.fit_transform(X_faces)

# Step 2: Apply PCA with all components
pca_full = PCA()
X_faces_pca_full = pca_full.fit_transform(X_faces_scaled)

print("Solution for Problem 5: Olivetti Faces PCA")
print("="*60)

# Question 2: Components for 90% variance
cumsum_var = np.cumsum(pca_full.explained_variance_ratio_)
n_components_90 = np.argmax(cumsum_var >= 0.90) + 1

print(f"\nComponents needed for 90% variance: {n_components_90}")
print(f"Actual variance retained: {cumsum_var[n_components_90-1]*100:.2f}%")

# Question 5: Compression ratio
original_size = X_faces.shape[1]  # 4096 features
compressed_size = n_components_90
compression_ratio = original_size / compressed_size

print(f"\nCompression:")
print(f"  Original dimensions: {original_size}")
print(f"  Compressed dimensions: {compressed_size}")
print(f"  Compression ratio: {compression_ratio:.2f}x")

# Step 3: Scree plot
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Individual variance
axes[0].bar(range(1, min(51, len(pca_full.explained_variance_ratio_)+1)), 
            pca_full.explained_variance_ratio_[:50], color='steelblue', edgecolor='black')
axes[0].axhline(y=0.02, color='r', linestyle='--', label='2% threshold')
axes[0].set_xlabel('Principal Component', fontsize=12)
axes[0].set_ylabel('Explained Variance Ratio', fontsize=12)
axes[0].set_title('Scree Plot: Individual Variance (First 50 PCs)', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3, axis='y')

# Cumulative variance
axes[1].plot(range(1, len(cumsum_var)+1), cumsum_var*100, 'b-', linewidth=2)
axes[1].axhline(y=90, color='r', linestyle='--', linewidth=2, label='90% threshold')
axes[1].axvline(x=n_components_90, color='g', linestyle='--', linewidth=2, 
                label=f'{n_components_90} components')
axes[1].scatter([n_components_90], [cumsum_var[n_components_90-1]*100], 
                color='red', s=100, zorder=5)
axes[1].set_xlabel('Number of Components', fontsize=12)
axes[1].set_ylabel('Cumulative Explained Variance (%)', fontsize=12)
axes[1].set_title('Cumulative Variance Explained', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)
axes[1].set_xlim(0, min(200, len(cumsum_var)))

plt.tight_layout()
plt.show()

# Step 4 & 5: Reconstruct images
pca_compressed = PCA(n_components=n_components_90)
X_faces_compressed = pca_compressed.fit_transform(X_faces_scaled)
X_faces_reconstructed = pca_compressed.inverse_transform(X_faces_compressed)
X_faces_reconstructed = scaler_faces.inverse_transform(X_faces_reconstructed)

# Visualize original vs reconstructed
n_samples = 5
fig, axes = plt.subplots(2, n_samples, figsize=(15, 6))

for i in range(n_samples):
    # Original
    axes[0, i].imshow(X_faces[i].reshape(64, 64), cmap='gray')
    axes[0, i].set_title(f'Original\nPerson {y_faces[i]}')
    axes[0, i].axis('off')
    
    # Reconstructed
    axes[1, i].imshow(X_faces_reconstructed[i].reshape(64, 64), cmap='gray')
    axes[1, i].set_title(f'Reconstructed\n({n_components_90} PCs)')
    axes[1, i].axis('off')

plt.suptitle(f'Face Reconstruction with PCA ({n_components_90} components, {compression_ratio:.1f}x compression)', 
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Visualize eigenfaces (principal components as images)
print("\nBonus: Visualizing Eigenfaces (First 10 Principal Components)")
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
for i, ax in enumerate(axes.flat):
    eigenface = pca_compressed.components_[i].reshape(64, 64)
    ax.imshow(eigenface, cmap='gray')
    ax.set_title(f'PC{i+1}\n({pca_full.explained_variance_ratio_[i]*100:.1f}%)')
    ax.axis('off')
plt.suptitle('Eigenfaces: Principal Components as Images', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Calculate reconstruction error
mse = np.mean((X_faces - X_faces_reconstructed)**2)
print(f"\nReconstruction Error (MSE): {mse:.6f}")
```

**Expected Answer:**
- Need approximately 100-150 components for 90% variance
- Compression ratio: ~27-40x (much better than digits!)
- Reconstructed faces are very similar to originals
- Eigenfaces show the most important facial features
- First few eigenfaces capture lighting, face shape, and key facial features
- Higher-order eigenfaces capture finer details

</details>

---
## Problem 6: Wine Dataset - Feature Importance

### Problem Statement

The **Wine dataset** contains chemical analysis of 178 wines with 13 features:
- Alcohol, Malic acid, Ash, Alkalinity of ash, Magnesium, Total phenols, Flavanoids, 
  Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315, Proline

There are 3 wine cultivars (classes).

**Questions:**
1. Apply PCA and reduce to 2D
2. Which original features are most important for PC1?
3. Which original features are most important for PC2?
4. Create a biplot showing both data points and feature loadings
5. Can you identify which features distinguish the wine cultivars?

In [None]:
# Data for Problem 6
wine = load_wine()
X_wine = wine.data
y_wine = wine.target
feature_names_wine = wine.feature_names
target_names_wine = wine.target_names

print("Problem 6 Data Loaded: Wine Dataset")
print("="*60)
print(f"Number of samples: {X_wine.shape[0]}")
print(f"Number of features: {X_wine.shape[1]}")
print(f"\nFeatures:")
for i, name in enumerate(feature_names_wine, 1):
    print(f"  {i}. {name}")
print(f"\nTarget names: {target_names_wine}")
print("\n‚Üí YOUR TASK: Apply PCA and analyze feature importance")

In [None]:
# YOUR SOLUTION HERE
# Steps:
# 1. Standardize
# 2. Apply PCA with 2 components
# 3. Analyze loadings
# 4. Create biplot



<details>
<summary><b>üìñ Show Solution</b></summary>

```python
# Solution for Problem 6

# Step 1: Standardize
scaler_wine = StandardScaler()
X_wine_scaled = scaler_wine.fit_transform(X_wine)

# Step 2: Apply PCA
pca_wine = PCA(n_components=2)
X_wine_pca = pca_wine.fit_transform(X_wine_scaled)

print("Solution for Problem 6: Wine Dataset PCA")
print("="*60)

print(f"\nExplained Variance:")
print(f"  PC1: {pca_wine.explained_variance_ratio_[0]*100:.2f}%")
print(f"  PC2: {pca_wine.explained_variance_ratio_[1]*100:.2f}%")
print(f"  Total: {sum(pca_wine.explained_variance_ratio_)*100:.2f}%")

# Step 3: Analyze loadings
loadings_wine = pd.DataFrame(
    pca_wine.components_.T,
    columns=['PC1', 'PC2'],
    index=feature_names_wine
)
loadings_wine['PC1_abs'] = np.abs(loadings_wine['PC1'])
loadings_wine['PC2_abs'] = np.abs(loadings_wine['PC2'])

print("\nFeature Loadings:")
print(loadings_wine[['PC1', 'PC2']].to_string())

# Question 2 & 3: Most important features
print("\nTop 3 features for PC1:")
top_pc1 = loadings_wine.nlargest(3, 'PC1_abs')[['PC1']]
print(top_pc1.to_string())

print("\nTop 3 features for PC2:")
top_pc2 = loadings_wine.nlargest(3, 'PC2_abs')[['PC2']]
print(top_pc2.to_string())

# Step 4: Biplot
fig, axes = plt.subplots(1, 2, figsize=(18, 7))

# Scatter plot
colors = ['red', 'green', 'blue']
for i, (color, target_name) in enumerate(zip(colors, target_names_wine)):
    mask = y_wine == i
    axes[0].scatter(X_wine_pca[mask, 0], X_wine_pca[mask, 1], 
                    color=color, label=target_name, alpha=0.7, edgecolor='black', s=80)

axes[0].set_xlabel(f'PC1 ({pca_wine.explained_variance_ratio_[0]*100:.1f}% variance)', fontsize=12)
axes[0].set_ylabel(f'PC2 ({pca_wine.explained_variance_ratio_[1]*100:.1f}% variance)', fontsize=12)
axes[0].set_title('Wine Dataset in PCA Space', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# Biplot with loadings
for i, (color, target_name) in enumerate(zip(colors, target_names_wine)):
    mask = y_wine == i
    axes[1].scatter(X_wine_pca[mask, 0], X_wine_pca[mask, 1], 
                    color=color, label=target_name, alpha=0.3, s=50)

# Add loading vectors (scaled for visibility)
scale = 3
for i, feature in enumerate(feature_names_wine):
    # Only show most important features to avoid clutter
    if loadings_wine.iloc[i]['PC1_abs'] > 0.2 or loadings_wine.iloc[i]['PC2_abs'] > 0.2:
        axes[1].arrow(0, 0, pca_wine.components_[0, i]*scale, pca_wine.components_[1, i]*scale,
                      head_width=0.15, head_length=0.15, fc='black', ec='black', linewidth=2)
        # Shorten feature names for readability
        short_name = feature.replace('od280/od315_of_diluted_wines', 'OD ratio')
        short_name = short_name.replace('nonflavanoid_phenols', 'nonflav_phenols')
        axes[1].text(pca_wine.components_[0, i]*scale*1.15, pca_wine.components_[1, i]*scale*1.15,
                     short_name, fontsize=9, ha='center', 
                     bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.7))

axes[1].set_xlabel(f'PC1 ({pca_wine.explained_variance_ratio_[0]*100:.1f}% variance)', fontsize=12)
axes[1].set_ylabel(f'PC2 ({pca_wine.explained_variance_ratio_[1]*100:.1f}% variance)', fontsize=12)
axes[1].set_title('PCA Biplot: Important Feature Loadings', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Question 5: Distinguishing features
print("\nQuestion 5: Features that distinguish wine cultivars:")
print("  ‚Üí PC1 (horizontal axis) is dominated by:")
print("    - Flavanoids, Total phenols, OD280/OD315 ratio")
print("    - These separate class_0 from classes_1 and class_2")
print("  ‚Üí PC2 (vertical axis) is influenced by:")
print("    - Color intensity, Proline, Malic acid")
print("    - These help separate class_1 from class_2")
```

**Expected Answer:**
- PC1 explains ~36% variance, PC2 explains ~19%
- PC1: Flavanoids, Total phenols, OD280/OD315 ratio are most important
- PC2: Color intensity, Proline, Malic acid are most important
- The three wine cultivars are well-separated in 2D PCA space

</details>

---
## Summary and Key Takeaways

### Optimization with SciPy

1. **Linear Programming** (`linprog`):
   - For problems with linear objective and linear constraints
   - Use `method='highs'` for best performance
   - Remember to negate coefficients when maximizing

2. **Nonlinear Optimization** (`minimize`):
   - Choose method based on problem characteristics:
     - `'BFGS'`: Fast, requires gradient (can approximate)
     - `'Nelder-Mead'`: No gradient needed, robust but slower
     - `'SLSQP'`: For constrained problems
   - Always provide good initial guesses
   - Use analytical gradients when possible for speed

3. **Constraints**:
   - Use `LinearConstraint` for linear constraints
   - Use `NonlinearConstraint` for nonlinear constraints
   - Bounds are specified separately

### PCA for Dimensionality Reduction

1. **Always standardize** data before PCA (using `StandardScaler`)

2. **Choosing number of components**:
   - Use scree plot (elbow method)
   - Cumulative variance threshold (e.g., 90%, 95%)
   - Cross-validation for downstream tasks

3. **Interpretation**:
   - Explained variance ratio: how much information each PC captures
   - Component loadings: which original features contribute to each PC
   - Biplots: visualize both data and feature relationships

4. **Applications**:
   - Visualization (reduce to 2D or 3D)
   - Feature extraction for machine learning
   - Noise reduction
   - Data compression

### Best Practices

- **Optimization**: Start simple, validate results, check convergence
- **PCA**: Understand what variance means in your domain
- **Both**: Always visualize your results!

---
