# 支持向量机（回归）

使用三种不同核函数配置的支持向量机回归模型进行训练，并分别对波士顿房价进行预测。

In [1]:
# 导入数据
from sklearn.datasets import load_boston
boston = load_boston()

In [31]:
# 数据集合预览
print boston.DESCR
print boston.data[:4]
print boston.target[:4]
print boston.data.shape
print boston.target.shape

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [32]:
# 标准化数据
from sklearn.preprocessing import StandardScaler
ss_x = StandardScaler()
ss_y = StandardScaler()
x = ss_x.fit_transform(boston.data.reshape(-1, 1)).reshape(506, 13)
y = ss_y.fit_transform(boston.target.reshape(-1, 1)).reshape(506)

In [33]:
# 数据集切分
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=33)

In [34]:
# 模型应用部分
from sklearn.svm import SVR  # 支持向量机回归模型

# 线性核函数
linear_svr = SVR(kernel='linear')
linear_svr.fit(x_train, y_train)
linear_svr_y_predict = linear_svr.predict(x_test)

# 多项式核函数
poly_svr = SVR(kernel='poly')
poly_svr.fit(x_train, y_train)
poly_svr_y_predict = poly_svr.predict(x_test)

# 径向基核函数
rbf_svr = SVR(kernel='rbf')
rbf_svr.fit(x_train, y_train)
rbf_svr_y_predict = rbf_svr.predict(x_test)

In [35]:
# 模型得分评估
print "linear_svr's accuracy:", linear_svr.score(x_test, y_test)
print "poly_svr's accuracy:", poly_svr.score(x_test, y_test)
print "rbf_svr's accuracy:", rbf_svr.score(x_test, y_test)

linear_svr's accuracy: 0.49618747316
poly_svr's accuracy: 0.376766392986
rbf_svr's accuracy: 0.36781135012


**特点分析：**支持向量机系列模型可以通过配置不同的核函数来改变模型的性能。可以尝试。

> **核函数**是一种非常有用的特征映射技巧，同时在数学描述上也略为复杂。简单理解，便是通过某种函数计算，将原有的特征映射到更高维度的空间，从而尽可能达到新的高维度特征线性可分程度，结合支持向量机的特点，这种高维度线性可分的数据特征恰好可以发挥其模型优势。