# 04. SVM Soft Margin 완전 정복

## 목표
- Slack variable의 **동기와 필요성** 완벽 이해
- Soft Margin의 수학적 formulation 마스터
- C parameter의 역할과 의미
- Hard Margin에서 Soft Margin으로의 자연스러운 확장

---

## 1. Hard Margin의 한계

### 1.1 문제점

**Hard Margin SVM**의 요구사항:
$$y_i(w^T x_i + b) \geq 1 \quad \forall i$$

**문제**:
1. 데이터가 **선형 분리 불가능**하면 해가 존재하지 않음
2. **Outlier** 하나 때문에 전체 초평면이 크게 바뀜
3. **과적합**: Training data에 너무 엄격하게 맞춤

### 1.2 해결책

**Soft Margin**: 일부 점들이 margin을 **위반**하는 것을 허용

But:
- 위반에 대한 **penalty**를 부과
- Margin 최대화와 penalty 최소화 사이의 **trade-off**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.svm import SVC

sns.set_style('whitegrid')
np.random.seed(42)

# 선형 분리 불가능한 데이터 생성
# Class +1
X_pos = np.random.randn(30, 2) + np.array([2, 2])
y_pos = np.ones(30)

# Class -1  
X_neg = np.random.randn(30, 2) + np.array([-2, -2])
y_neg = -np.ones(30)

# Outliers 추가
X_outlier_pos = np.array([[-2, -1]])  # Negative region에 positive outlier
y_outlier_pos = np.array([1])
X_outlier_neg = np.array([[2, 1]])     # Positive region에 negative outlier
y_outlier_neg = np.array([-1])

# 전체 데이터
X = np.vstack([X_pos, X_neg, X_outlier_pos, X_outlier_neg])
y = np.hstack([y_pos, y_neg, y_outlier_pos, y_outlier_neg])

# 시각화
plt.figure(figsize=(10, 8))
plt.scatter(X_pos[:, 0], X_pos[:, 1], c='red', marker='o', s=100, label='Class +1', edgecolors='k')
plt.scatter(X_neg[:, 0], X_neg[:, 1], c='blue', marker='s', s=100, label='Class -1', edgecolors='k')
plt.scatter(X_outlier_pos[:, 0], X_outlier_pos[:, 1], c='red', marker='*', s=500, 
           label='Outlier (+1)', edgecolors='black', linewidths=2)
plt.scatter(X_outlier_neg[:, 0], X_outlier_neg[:, 1], c='blue', marker='*', s=500,
           label='Outlier (-1)', edgecolors='black', linewidths=2)

plt.xlabel('$x_1$', fontsize=14)
plt.ylabel('$x_2$', fontsize=14)
plt.title('선형 분리 불가능 데이터 (Outliers 포함)', fontsize=16, fontweight='bold')
plt.legend(fontsize=12)
plt.grid(alpha=0.3)
plt.axis('equal')
plt.show()

print("문제: Outliers 때문에 완벽한 선형 분리가 불가능!")
print("→ Hard Margin SVM은 해를 찾을 수 없습니다.")
print("→ Soft Margin SVM이 필요합니다!")

## 2. Slack Variable (ξ) 도입

### 2.1 동기 (Motivation)

**아이디어**: 각 데이터 포인트가 제약을 **얼마나 위반**하는지 측정

원래 제약:
$$y_i(w^T x_i + b) \geq 1$$

Slack variable $\xi_i \geq 0$를 도입:
$$y_i(w^T x_i + b) \geq 1 - \xi_i$$

### 2.2 ξ의 의미

- $\xi_i = 0$: 제약을 **완벽하게** 만족 (margin 밖)
- $0 < \xi_i < 1$: Margin 안쪽이지만 **올바르게** 분류됨
- $\xi_i \geq 1$: **잘못** 분류됨 (decision boundary 건너편)

### 2.3 Geometric Interpretation

점 $x_i$의 "위반 정도":
$$\xi_i = \max(0, 1 - y_i(w^T x_i + b))$$

이것은 **hinge loss**입니다!

In [None]:
# Slack variable 시각화

# Hinge loss
def hinge_loss(margin):
    """Hinge loss: max(0, 1 - margin)"""
    return np.maximum(0, 1 - margin)

# y * (w^T x + b) 값
margin_values = np.linspace(-2, 3, 100)
loss_values = hinge_loss(margin_values)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Hinge loss plot
ax = axes[0]
ax.plot(margin_values, loss_values, 'b-', linewidth=3, label='ξ = max(0, 1 - yf(x))')
ax.axhline(0, color='k', linestyle='-', linewidth=0.5)
ax.axvline(1, color='r', linestyle='--', linewidth=2, label='Margin boundary')
ax.fill_between(margin_values, 0, loss_values, alpha=0.3)
ax.set_xlabel('$y_i(w^T x_i + b)$', fontsize=14)
ax.set_ylabel('Slack $\\xi_i$', fontsize=14)
ax.set_title('Hinge Loss (Slack Variable)', fontsize=14, fontweight='bold')
ax.legend(fontsize=12)
ax.grid(alpha=0.3)
ax.set_ylim(0, 3)

# 예시 점들
ax = axes[1]
example_points = [
    {'margin': 2.0, 'xi': 0, 'label': 'Correct (margin 밖)', 'color': 'green'},
    {'margin': 0.5, 'xi': 0.5, 'label': 'Margin 안 (올바름)', 'color': 'orange'},
    {'margin': -0.5, 'xi': 1.5, 'label': '잘못 분류', 'color': 'red'},
]

x_pos = np.arange(len(example_points))
xi_values = [p['xi'] for p in example_points]
colors = [p['color'] for p in example_points]

bars = ax.bar(x_pos, xi_values, color=colors, alpha=0.7, edgecolor='black', linewidth=2)
ax.axhline(1, color='r', linestyle='--', linewidth=2, label='ξ = 1 (decision boundary)')
ax.set_xlabel('Point type', fontsize=14)
ax.set_ylabel('Slack $\\xi$', fontsize=14)
ax.set_title('Slack Variable Values', fontsize=14, fontweight='bold')
ax.set_xticks(x_pos)
ax.set_xticklabels([p['label'] for p in example_points], rotation=15, ha='right')
ax.legend(fontsize=12)
ax.grid(alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("=== Slack Variable 해석 ===")
for i, p in enumerate(example_points):
    print(f"{i+1}. {p['label']}:")
    print(f"   y(w^T x + b) = {p['margin']:.1f}, ξ = {p['xi']:.1f}")

## 3. Soft Margin Primal Problem

### 3.1 최적화 문제

**목표**: Margin 최대화 + Slack penalty 최소화

$$\begin{align}
\min_{w, b, \xi} \quad & \frac{1}{2}||w||^2 + C\sum_{i=1}^{n} \xi_i \\
\text{subject to} \quad & y_i(w^T x_i + b) \geq 1 - \xi_i, \quad i = 1, \ldots, n \\
& \xi_i \geq 0, \quad i = 1, \ldots, n
\end{align}$$

### 3.2 C Parameter의 역할

**C**: Regularization parameter

- **C 큼** (예: C = 1000):
  - Slack penalty가 크다
  - 위반을 최소화 → Hard Margin에 가까움
  - 과적합 위험
  
- **C 작음** (예: C = 0.01):
  - Slack penalty가 작다
  - Margin 최대화 우선 → 많은 위반 허용
  - 과소적합 위험

### 3.3 Trade-off

$$\underbrace{\frac{1}{2}||w||^2}_{\text{Margin 최대화}} + \underbrace{C\sum_i \xi_i}_{\text{Error 최소화}}$$

**C가 조절**하는 것:
- Large margin을 원하는가?
- Training error를 줄이기 원하는가?

In [None]:
# C parameter 효과 시각화

C_values = [0.01, 0.1, 1, 100]
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
axes = axes.ravel()

for idx, C in enumerate(C_values):
    # SVM 학습
    svm = SVC(kernel='linear', C=C)
    svm.fit(X, y)
    
    ax = axes[idx]
    
    # Decision boundary
    xx, yy = np.meshgrid(np.linspace(-6, 6, 200), np.linspace(-6, 6, 200))
    Z = svm.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Contour plot
    ax.contour(xx, yy, Z, levels=[-1, 0, 1], linewidths=[2, 3, 2],
              linestyles=['--', '-', '--'], colors=['b', 'g', 'r'])
    
    # 데이터 플롯
    ax.scatter(X_pos[:, 0], X_pos[:, 1], c='red', marker='o', s=50, edgecolors='k', alpha=0.6)
    ax.scatter(X_neg[:, 0], X_neg[:, 1], c='blue', marker='s', s=50, edgecolors='k', alpha=0.6)
    ax.scatter(X_outlier_pos[:, 0], X_outlier_pos[:, 1], c='red', marker='*', s=300,
              edgecolors='black', linewidths=2)
    ax.scatter(X_outlier_neg[:, 0], X_outlier_neg[:, 1], c='blue', marker='*', s=300,
              edgecolors='black', linewidths=2)
    
    # Support vectors
    ax.scatter(svm.support_vectors_[:, 0], svm.support_vectors_[:, 1],
              s=200, facecolors='none', edgecolors='yellow', linewidths=3, label='Support Vectors')
    
    # Margin width 계산
    w = svm.coef_[0]
    margin = 2 / np.linalg.norm(w)
    
    ax.set_xlim(-6, 6)
    ax.set_ylim(-6, 6)
    ax.set_xlabel('$x_1$', fontsize=12)
    ax.set_ylabel('$x_2$', fontsize=12)
    ax.set_title(f'C = {C}, Margin = {margin:.2f}, #SV = {len(svm.support_vectors_)}',
                fontsize=12, fontweight='bold')
    ax.legend()
    ax.grid(alpha=0.3)
    ax.set_aspect('equal')

plt.tight_layout()
plt.show()

print("=== C Parameter 효과 ===")
print("\nC 작음 (0.01):")
print("  - Margin이 넓음")
print("  - 많은 support vectors")
print("  - Outliers를 무시")
print("\nC 큼 (100):")
print("  - Margin이 좁음")
print("  - 적은 support vectors")
print("  - Outliers에 민감")

## 4. Lagrangian과 Dual Problem

### 4.1 Lagrangian

Lagrange multipliers: $\alpha_i \geq 0$ (for margin constraints), $\mu_i \geq 0$ (for $\xi_i \geq 0$)

$$\mathcal{L}(w, b, \xi, \alpha, \mu) = \frac{1}{2}||w||^2 + C\sum_{i=1}^{n} \xi_i - \sum_{i=1}^{n} \alpha_i [y_i(w^T x_i + b) - 1 + \xi_i] - \sum_{i=1}^{n} \mu_i \xi_i$$

### 4.2 KKT Stationarity Conditions

$$\frac{\partial \mathcal{L}}{\partial w} = w - \sum_{i=1}^{n} \alpha_i y_i x_i = 0 \quad \Rightarrow \quad w = \sum_{i=1}^{n} \alpha_i y_i x_i$$

$$\frac{\partial \mathcal{L}}{\partial b} = -\sum_{i=1}^{n} \alpha_i y_i = 0 \quad \Rightarrow \quad \sum_{i=1}^{n} \alpha_i y_i = 0$$

$$\frac{\partial \mathcal{L}}{\partial \xi_i} = C - \alpha_i - \mu_i = 0 \quad \Rightarrow \quad \alpha_i + \mu_i = C$$

### 4.3 핵심 통찰

$\mu_i \geq 0$이고 $\alpha_i + \mu_i = C$이므로:

$$0 \leq \alpha_i \leq C$$

**Box constraint**!

### 4.4 Dual Problem

위를 대입하면 Hard Margin과 **똑같은 형태**:

$$\begin{align}
\max_{\alpha} \quad & \sum_{i=1}^{n} \alpha_i - \frac{1}{2}\sum_{i,j=1}^{n} \alpha_i \alpha_j y_i y_j x_i^T x_j \\
\text{subject to} \quad & 0 \leq \alpha_i \leq C, \quad i = 1, \ldots, n \\
& \sum_{i=1}^{n} \alpha_i y_i = 0
\end{align}$$

**차이점**: $\alpha_i \geq 0$ → $0 \leq \alpha_i \leq C$

## 5. KKT Complementary Slackness

### 5.1 조건들

1. $\alpha_i [y_i(w^T x_i + b) - 1 + \xi_i] = 0$
2. $\mu_i \xi_i = 0$
3. $\alpha_i + \mu_i = C$

### 5.2 케이스 분석

**Case 1**: $\alpha_i = 0$
- $\mu_i = C > 0$ → $\xi_i = 0$
- $y_i(w^T x_i + b) > 1$ (margin 밖, 올바르게 분류)
- **Non-support vector**

**Case 2**: $0 < \alpha_i < C$
- $\mu_i > 0$ → $\xi_i = 0$
- $y_i(w^T x_i + b) = 1$ (margin 경계)
- **Support vector on the margin**

**Case 3**: $\alpha_i = C$
- $\mu_i = 0$ → $\xi_i \geq 0$ (자유)
- **Sub-cases**: 
  - $\xi_i < 1$: Margin 안이지만 올바르게 분류
  - $\xi_i = 1$: Decision boundary 위
  - $\xi_i > 1$: 잘못 분류
- **Support vector inside margin or misclassified**

In [None]:
# Soft Margin SVM 직접 풀기
from scipy.optimize import minimize

# 데이터 (outlier 포함)
X_train = np.vstack([X_pos[:5], X_neg[:5], X_outlier_pos, X_outlier_neg])
y_train = np.hstack([y_pos[:5], y_neg[:5], y_outlier_pos, y_outlier_neg])

n_samples = len(y_train)
C_param = 1.0

# Kernel matrix
K = X_train @ X_train.T
Q = np.outer(y_train, y_train) * K

def dual_objective(alpha):
    return 0.5 * alpha @ Q @ alpha - np.sum(alpha)

def dual_gradient(alpha):
    return Q @ alpha - np.ones(n_samples)

# 제약
constraints = {'type': 'eq', 'fun': lambda alpha: np.sum(alpha * y_train)}
bounds = [(0, C_param) for _ in range(n_samples)]  # Box constraint!

# 최적화
alpha_init = np.ones(n_samples) * 0.1
result = minimize(
    dual_objective,
    alpha_init,
    method='SLSQP',
    jac=dual_gradient,
    bounds=bounds,
    constraints=constraints
)

alpha_opt = result.x

print("=== Soft Margin SVM 풀이 ===")
print(f"C = {C_param}")
print(f"\n최적 α:")
for i, a in enumerate(alpha_opt):
    print(f"  α_{i} = {a:.4f}")

# w 복원
w_opt = np.sum(alpha_opt[:, np.newaxis] * y_train[:, np.newaxis] * X_train, axis=0)
print(f"\nw = {w_opt}")

# Support vectors 분류
print("\n=== Support Vectors 분석 ===")
eps = 1e-5

for i in range(n_samples):
    if alpha_opt[i] < eps:
        sv_type = "Non-SV (α=0)"
    elif alpha_opt[i] < C_param - eps:
        sv_type = "SV on margin (0 < α < C)"
    else:
        sv_type = "SV inside/misclassified (α=C)"
    
    print(f"Point {i}: α={alpha_opt[i]:.4f}, {sv_type}")

# b 계산 (0 < α < C인 점들 사용)
margin_sv = np.where((alpha_opt > eps) & (alpha_opt < C_param - eps))[0]
if len(margin_sv) > 0:
    b_values = []
    for idx in margin_sv:
        b_val = y_train[idx] - w_opt @ X_train[idx]
        b_values.append(b_val)
    b_opt = np.mean(b_values)
else:
    # Fallback: use all support vectors
    sv_all = np.where(alpha_opt > eps)[0]
    b_values = [y_train[i] - w_opt @ X_train[i] for i in sv_all]
    b_opt = np.mean(b_values)

print(f"\nb = {b_opt:.4f}")

# Slack variables 계산
print("\n=== Slack Variables ===")
for i in range(n_samples):
    margin_val = y_train[i] * (w_opt @ X_train[i] + b_opt)
    xi = max(0, 1 - margin_val)
    print(f"Point {i}: y(w^T x + b) = {margin_val:.4f}, ξ = {xi:.4f}")

## 6. Soft Margin vs Hard Margin 비교

| 항목 | Hard Margin | Soft Margin |
|------|-------------|-------------|
| **제약** | $y_i(w^T x_i + b) \geq 1$ | $y_i(w^T x_i + b) \geq 1 - \xi_i$ |
| **목적 함수** | $\frac{1}{2}\\|w\\|^2$ | $\frac{1}{2}\\|w\\|^2 + C\sum_i \xi_i$ |
| **Dual 제약** | $\alpha_i \geq 0$ | $0 \leq \alpha_i \leq C$ |
| **요구사항** | 선형 분리 가능 | 항상 해가 존재 |
| **Outlier** | 매우 민감 | 강건 (robust) |
| **파라미터** | 없음 | C (regularization) |

### 6.1 C → ∞의 극한

- $C \to \infty$: $\sum_i \xi_i \to 0$ (모든 slack이 0)
- **Hard Margin과 같아짐**!

따라서 Soft Margin은 Hard Margin의 **일반화**입니다.

In [None]:
# C 변화에 따른 수렴 시각화

# 선형 분리 가능한 데이터 (outlier 제외)
X_clean = np.vstack([X_pos[:10], X_neg[:10]])
y_clean = np.hstack([y_pos[:10], y_neg[:10]])

C_range = np.logspace(-2, 3, 20)
margins = []
n_support_vectors = []

for C in C_range:
    svm = SVC(kernel='linear', C=C)
    svm.fit(X_clean, y_clean)
    
    w = svm.coef_[0]
    margin = 2 / np.linalg.norm(w)
    margins.append(margin)
    n_support_vectors.append(len(svm.support_vectors_))

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Margin vs C
ax = axes[0]
ax.semilogx(C_range, margins, 'o-', linewidth=2, markersize=8)
ax.set_xlabel('C (log scale)', fontsize=14)
ax.set_ylabel('Margin', fontsize=14)
ax.set_title('C → ∞ 일 때 Hard Margin으로 수렴', fontsize=14, fontweight='bold')
ax.grid(alpha=0.3)

# Number of SVs vs C
ax = axes[1]
ax.semilogx(C_range, n_support_vectors, 'o-', linewidth=2, markersize=8, color='orange')
ax.set_xlabel('C (log scale)', fontsize=14)
ax.set_ylabel('# Support Vectors', fontsize=14)
ax.set_title('C 증가 → Support Vectors 감소', fontsize=14, fontweight='bold')
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("=== C의 효과 ===")
print("\nC가 작을 때:")
print(f"  Margin = {margins[0]:.4f}")
print(f"  # SVs = {n_support_vectors[0]}")
print("\nC가 클 때:")
print(f"  Margin = {margins[-1]:.4f}")
print(f"  # SVs = {n_support_vectors[-1]}")
print("\n→ C ↑ : Margin ↓, # SVs ↓")

## 7. Hinge Loss와 SVM

### 7.1 Unconstrained Formulation

Soft Margin SVM은 **unconstrained** 형태로 쓸 수 있습니다:

$$\min_{w, b} \frac{1}{2}||w||^2 + C\sum_{i=1}^{n} \max(0, 1 - y_i(w^T x_i + b))$$

**Hinge loss**: $\ell(y, f(x)) = \max(0, 1 - yf(x))$

### 7.2 Regularized ERM

$$\underbrace{\frac{\lambda}{2}||w||^2}_{\text{Regularization}} + \underbrace{\sum_{i=1}^{n} \ell(y_i, f(x_i))}_{\text{Empirical Risk}}$$

where $\lambda = \frac{1}{C}$

**SVM의 본질**: L2 regularization + Hinge loss

### 7.3 다른 Loss들과 비교

- **0-1 Loss**: $\mathbb{1}[yf(x) < 0]$ (not convex, hard to optimize)
- **Hinge Loss**: $\max(0, 1 - yf(x))$ (convex, sparse)
- **Logistic Loss**: $\log(1 + e^{-yf(x)})$ (convex, smooth)
- **Squared Hinge**: $(\max(0, 1 - yf(x)))^2$ (differentiable)

In [None]:
# 다양한 loss functions 비교

def zero_one_loss(yf):
    return (yf < 0).astype(float)

def hinge_loss(yf):
    return np.maximum(0, 1 - yf)

def logistic_loss(yf):
    return np.log(1 + np.exp(-yf))

def squared_hinge(yf):
    return np.maximum(0, 1 - yf) ** 2

yf_range = np.linspace(-2, 3, 200)

plt.figure(figsize=(10, 6))
plt.plot(yf_range, zero_one_loss(yf_range), 'k-', linewidth=2, label='0-1 Loss')
plt.plot(yf_range, hinge_loss(yf_range), 'b-', linewidth=2, label='Hinge Loss')
plt.plot(yf_range, logistic_loss(yf_range), 'r-', linewidth=2, label='Logistic Loss')
plt.plot(yf_range, squared_hinge(yf_range), 'g-', linewidth=2, label='Squared Hinge')

plt.axvline(0, color='gray', linestyle='--', alpha=0.5, label='Decision boundary')
plt.axvline(1, color='orange', linestyle='--', alpha=0.5, label='Margin')

plt.xlabel('$y \\cdot f(x)$', fontsize=14)
plt.ylabel('Loss', fontsize=14)
plt.title('Classification Loss Functions Comparison', fontsize=16, fontweight='bold')
plt.legend(fontsize=12)
plt.grid(alpha=0.3)
plt.ylim(0, 4)
plt.show()

print("=== Loss Functions 특징 ===")
print("\n0-1 Loss:")
print("  - 실제 classification error")
print("  - Non-convex → 최적화 어려움")
print("\nHinge Loss (SVM):")
print("  - Convex upper bound of 0-1")
print("  - Sparse solution (많은 α=0)")
print("  - Margin 개념 포함")
print("\nLogistic Loss:")
print("  - Smooth, differentiable everywhere")
print("  - Probabilistic interpretation")
print("  - Dense solution (모든 점이 영향)")

## 8. 요약

### 8.1 Soft Margin SVM 핵심

1. **Slack variable** $\xi_i$:
   - Margin 위반 정도 측정
   - $\xi_i = \max(0, 1 - y_i(w^T x_i + b))$ (Hinge loss)

2. **C parameter**:
   - Margin vs Error trade-off
   - $C \to \infty$: Hard Margin
   - $C \to 0$: Maximum Margin (many violations)

3. **Dual formulation**:
   - Hard Margin과 거의 같음
   - 차이: $0 \leq \alpha_i \leq C$ (box constraint)

4. **Support Vectors**:
   - $\alpha_i = 0$: Non-SV
   - $0 < \alpha_i < C$: SV on margin
   - $\alpha_i = C$: SV inside margin or misclassified

### 8.2 실전 사용

- **항상 Soft Margin 사용**! (Hard Margin은 너무 제한적)
- **C는 cross-validation으로** 선택
- Dual을 풀면 **kernel trick** 사용 가능

### 8.3 다음 단계

- Kernel SVM (비선형 분류)
- SMO algorithm (효율적 QP solver)
- Multi-class SVM

## 9. 연습문제

### 문제 1: Slack Variable 계산

초평면 $w = [1, 1]^T$, $b = 0$이 주어졌을 때, 다음 점들의 slack variable을 계산하시오:

1. $x_1 = [2, 2]^T$, $y_1 = +1$
2. $x_2 = [1, 0.5]^T$, $y_2 = +1$
3. $x_3 = [-0.5, -0.5]^T$, $y_3 = -1$
4. $x_4 = [0.5, 0.5]^T$, $y_4 = -1$

In [None]:
# 문제 1 풀이

w = np.array([1, 1])
b = 0

points = [
    {'x': np.array([2, 2]), 'y': 1},
    {'x': np.array([1, 0.5]), 'y': 1},
    {'x': np.array([-0.5, -0.5]), 'y': -1},
    {'x': np.array([0.5, 0.5]), 'y': -1},
]

print("=== 문제 1: Slack Variable 계산 ===")
print(f"w = {w}, b = {b}\n")

for i, p in enumerate(points, 1):
    x, y = p['x'], p['y']
    
    # Margin
    margin = y * (w @ x + b)
    
    # Slack
    xi = max(0, 1 - margin)
    
    # 분류
    if xi == 0:
        status = "정상 (margin 밖)"
    elif xi < 1:
        status = "margin 안 (올바른 분류)"
    else:
        status = "잘못 분류"
    
    print(f"점 {i}: x = {x}, y = {y:+d}")
    print(f"  w^T x + b = {w @ x + b:.2f}")
    print(f"  y(w^T x + b) = {margin:.2f}")
    print(f"  ξ = max(0, 1 - {margin:.2f}) = {xi:.2f}")
    print(f"  상태: {status}\n")

### 문제 2: C 파라미터 선택

주어진 데이터에 대해 다양한 C 값으로 SVM을 학습하고:
1. Train accuracy
2. Number of support vectors
3. Margin width

를 계산하여 비교하시오.

In [None]:
# 문제 2 풀이

# 데이터 생성
from sklearn.datasets import make_classification

X_data, y_data = make_classification(n_samples=100, n_features=2, n_redundant=0,
                                      n_clusters_per_class=1, random_state=42)
y_data = 2 * y_data - 1  # {0, 1} → {-1, +1}

C_values = [0.01, 0.1, 1, 10, 100]
results = []

for C in C_values:
    svm = SVC(kernel='linear', C=C)
    svm.fit(X_data, y_data)
    
    # Metrics
    train_acc = svm.score(X_data, y_data)
    n_sv = len(svm.support_vectors_)
    w = svm.coef_[0]
    margin = 2 / np.linalg.norm(w)
    
    results.append({
        'C': C,
        'train_acc': train_acc,
        'n_sv': n_sv,
        'margin': margin
    })

print("=== 문제 2: C 파라미터 비교 ===")
print(f"{'C':>8} {'Train Acc':>12} {'# SVs':>8} {'Margin':>10}")
print("-" * 42)
for r in results:
    print(f"{r['C']:>8.2f} {r['train_acc']:>12.4f} {r['n_sv']:>8d} {r['margin']:>10.4f}")

# 시각화
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

C_vals = [r['C'] for r in results]

ax = axes[0]
ax.semilogx(C_vals, [r['train_acc'] for r in results], 'o-', linewidth=2, markersize=10)
ax.set_xlabel('C', fontsize=14)
ax.set_ylabel('Train Accuracy', fontsize=14)
ax.set_title('C vs Accuracy', fontsize=14, fontweight='bold')
ax.grid(alpha=0.3)

ax = axes[1]
ax.semilogx(C_vals, [r['n_sv'] for r in results], 'o-', linewidth=2, markersize=10, color='orange')
ax.set_xlabel('C', fontsize=14)
ax.set_ylabel('# Support Vectors', fontsize=14)
ax.set_title('C vs # SVs', fontsize=14, fontweight='bold')
ax.grid(alpha=0.3)

ax = axes[2]
ax.semilogx(C_vals, [r['margin'] for r in results], 'o-', linewidth=2, markersize=10, color='green')
ax.set_xlabel('C', fontsize=14)
ax.set_ylabel('Margin', fontsize=14)
ax.set_title('C vs Margin', fontsize=14, fontweight='bold')
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n관찰:")
print("- C ↑ : Train acc ↑, # SVs ↓, Margin ↓")
print("- 너무 큰 C는 과적합 위험!")

## 10. 추가 자료

### 강의 자료
- ML_L15b_SVM_(linear-soft).pdf
- ML_L15c_SVM_(kernel).pdf

### Online Resources
1. **YouTube**:
   - StatQuest: Soft Margin SVM
   - Andrew Ng: SVM with Outliers

2. **Blogs**:
   - [SVM explained with cats](https://www.youtube.com/watch?v=_PwhiWxHK8o)
   - [Kernel Methods Tutorial](https://people.csail.mit.edu/)

3. **Papers**:
   - Cortes & Vapnik (1995): Support-Vector Networks
   - Vapnik (1998): Statistical Learning Theory

### 다음 학습 주제
- Kernel SVM
- SMO algorithm
- ν-SVM (alternative parameterization)