## 问答题
1. 支持向量机的基本思想是什么？

在类之间拟合可能的最宽的街道

2. 什么是支持向量？

SVM训练后，在“街道以外”的地方增加更多的训练实例根本不会对决策边界产生影响，也就是说，它完全由位于街道边缘的实例所决定（或者“支持”）。这些实例被称为支持向量

3. 在使用 SVM 时，缩放输入值为什么很重要？

SVM的拟合类别之间可能的、最宽的街道，所以如果训练集不经缩放，SVM将趋于忽略值较小的特征

4. SVM 分类器在对实例进行分类时能输出置信度分数吗？概率呢？

SVM分类器可以输出测试实例与决策边界之间的距离，可以将其用作信心分数。如果创建SVM时，设置probability=True，可以得到概率值

5. 你如何在 LinearSVC、SVC 和 SGDClassifier 之间进行选择？

如果数据可分，可以用LinearSVC，如果数据不可分，可以用SVC，如果数据量非常大，可以用SGDClassifier，如果想输出概率值，应该使用SVC

6. 假设你已经使用 RBF 核训练了一个 SVM 分类器，但它似乎欠拟合训练集。
   你应该增大还是减小 γ（gamma）？C 呢？

增大gamma或C来降低正则化

7. ε 不敏感模型是什么意思？

如果在间隔区域内添加更多的训练实例，它不会影响模型的预测。因此，该模型被称为∈不敏感。

8. 使用核技巧有什么意义？

使用核技巧，特征数量不会出现组合爆炸式增长

## 编程题
1. 在葡萄酒数据集上训练SVM分类器，可以使用sklearn.datasets.load_wine()加载它。该数据集包含3个不同种植者生产的178个葡萄酒样本的化学分析：目标是训练一个分类模型，该模型能够根据葡萄酒的化学分析预测种植者。由于SVM分类器是二元分类器，将需要使用“一对全部”对所有三个类进行分类。能达到的精度是多少？

   "一对全部"可以复习 **8_sklearn做分类.ipynb**里的笔记，里面提到了用二元分类器做多分类问题

---

2. 提前预习 **10_支持向量机.ipynb** 最新更新的笔记 （把SVM分类用梯度下降实现）； 大概理解笔记后，尝试自己对照笔记 实现用梯度下降实现SVM分类

   并把自定义的SVM分类用于 iris data(鸢尾花数据)； 取花瓣长度 和 花瓣宽度特征， 分类 看是不是 分类2的花 （(iris.target == 2)

   对比下sklearn自带的SVM分类 和 自定义SVM分类 实现的分类效果



In [1]:
from sklearn.datasets import load_wine
import numpy as np

wine = load_wine(as_frame=True)

In [2]:
X = wine.data
y = wine.target

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [4]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

svm_clf = make_pipeline(StandardScaler(), SVC(kernel='rbf', random_state=42))
svm_clf.fit(X_train, y_train)
y_pred = svm_clf.predict(X_test)

In [5]:
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
accuracy

1.0

In [6]:
from sklearn.base import BaseEstimator

class MyLinearSVC(BaseEstimator):
    def __init__(self, C, eta0, n_epochs, random_state=None):
        self.C = C
        self.eta0 = eta0
        self.n_epochs = n_epochs
        self.random_state = random_state
        self._alpha = 1 / (2 * C)
        
    def eta(self):
        return self.eta0
    
    def fit(self, X, y):
        if self.random_state:
            np.random.seed(self.random_state)
        w = np.random.randn(X.shape[1], 1)
        b = 0
        t = np.array(y, dtype=np.float64).reshape(-1, 1) * 2 - 1
        m = X.shape[0]
        self.Js = []
        
        for epoch in range(self.n_epochs):
            support_vectors_idx = ((X@w + b)*t < 1).ravel()
            X_sv = X[support_vectors_idx]
            t_sv = t[support_vectors_idx]

            J = (np.sum(w * w) * self._alpha +  np.sum(1- t_sv * (X_sv@w + b))) / m
            self.Js.append(J)

            w_gradient_vector = (2*self._alpha*w  - X_sv.T @ t_sv) / m
            b_derivative = - np.sum(t_sv) / m

            w = w - self.eta() * w_gradient_vector
            b = b - self.eta() * b_derivative
            
        self.intercept_ = b
        self.coef_ = w
        support_vectors_idx = ((X@w + b)*t < 1).ravel()
        self.support_vectors_ = X[support_vectors_idx]
        return self
    
    def decision_function(self, X):
        return X.dot(self.coef_) + self.intercept_
    
    def predict(self, X):
        return self.decision_function(X) >= 0

In [7]:
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data[:, [2, 3]] 
y = iris.target == 2

X_test, X_train, y_test, y_train = train_test_split(X, y, test_size=0.2, random_state=42)

my_svm = make_pipeline(StandardScaler(), MyLinearSVC(C=1, eta0=0.1, n_epochs=1000, random_state=42))
my_svm.fit(X_train, y_train)
y_pred = my_svm.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

sklearn_svm = make_pipeline(StandardScaler(), SVC(kernel='linear', C=1, random_state=42))
sklearn_svm.fit(X_train, y_train)
y_pred = sklearn_svm.predict(X_test)
sklearn_accuracy = accuracy_score(y_test, y_pred)

accuracy, sklearn_accuracy

(0.9416666666666667, 0.9416666666666667)

在加州房屋数据集上训练和微调SVM回归器。可以使用原始数据集而不是 在课上使用的调整后的版本， 可以使用sklearn.datasets.fetch_california_housing()加载它。目标代表了数十万美元。 由于有超过20000个实例，SVM可能会很慢，因此对于超参数调整，应该使用更少的实例（例如2000个）来测试更多的超参数组合。最佳模型的RMSE是多少？

In [8]:
import numpy as np
import pandas as pd

data = pd.read_csv("data/california_housing_train.csv")
data

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.40,19.0,7650.0,1901.0,1129.0,463.0,1.8200,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.9250,65500.0
...,...,...,...,...,...,...,...,...,...
16995,-124.26,40.58,52.0,2217.0,394.0,907.0,369.0,2.3571,111400.0
16996,-124.27,40.69,36.0,2349.0,528.0,1194.0,465.0,2.5179,79000.0
16997,-124.30,41.84,17.0,2677.0,531.0,1244.0,456.0,3.0313,103600.0
16998,-124.30,41.80,19.0,2672.0,552.0,1298.0,478.0,1.9797,85800.0


In [25]:
from sklearn.svm import SVR

X = data.drop(columns=["median_house_value"])
y = data["median_house_value"]

X_sample, _, y_sample, _ = train_test_split(X, y, test_size=0.85, random_state=42)
X_test, X_train, y_test, y_train = train_test_split(X_sample, y_sample, test_size=0.2, random_state=42)
svm_pipeline = make_pipeline(StandardScaler(), SVR())
X_sample

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income
15971,-122.43,37.43,17.0,11999.0,2249.0,5467.0,1989.0,4.8405
13122,-121.88,37.28,33.0,2951.0,529.0,1288.0,521.0,4.1554
5962,-118.21,33.97,35.0,1863.0,537.0,2274.0,510.0,2.1005
2496,-117.62,33.43,24.0,1296.0,384.0,850.0,367.0,2.7545
14298,-122.09,37.37,34.0,2165.0,355.0,776.0,339.0,5.2971
...,...,...,...,...,...,...,...,...
11284,-121.13,37.74,21.0,2376.0,475.0,1175.0,441.0,3.6016
11964,-121.38,38.62,41.0,774.0,144.0,356.0,150.0,3.5625
5390,-118.15,33.91,38.0,901.0,205.0,760.0,208.0,2.9643
860,-117.07,32.56,9.0,3648.0,895.0,3293.0,840.0,3.0992


In [23]:
from sklearn.model_selection import GridSearchCV

param_grid = [
    {
        'svr__kernel': ['linear'],
        'svr__C': [0.1, 1, 10]
    },
    {
        'svr__kernel': ['rbf'],
        'svr__C': [0.1, 1, 10],
        'svr__gamma': ['scale', 'auto', 0.1, 1]
    }
]
grid_search = GridSearchCV(svm_pipeline, param_grid, cv=3, scoring='neg_root_mean_squared_error')
grid_search.fit(X_sample, y_sample)

In [24]:
grid_search.best_estimator_, grid_search.best_score_, grid_search.best_params_

(Pipeline(steps=[('standardscaler', StandardScaler()),
                 ('svr', SVR(C=10, kernel='linear'))]),
 -110611.29884314285,
 {'svr__C': 10, 'svr__kernel': 'linear'})

In [29]:
y_pred = grid_search.predict(X_test)

from sklearn.metrics import mean_squared_error
test_rmse = mean_squared_error(y_test, y_pred)
test_rmse

11287842262.378962