#### 模型性能评价

+ 读取Boston房价数据
+ 生成二次属性

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures

data = pd.read_csv("Boston_Housing.csv",header=None,delimiter=r"\s+")

X = data.iloc[:,0:13].to_numpy()
y = data.iloc[:,13].to_numpy().reshape(-1,1)

poly = PolynomialFeatures(degree=2,include_bias=False)
X_extend = poly.fit_transform(X)

print("Shape of Extened Features:", X_extend.shape)
print("Shape of y:", y.shape)

Shape of Extened Features: (506, 104)
Shape of y: (506, 1)


**最小二乘回归模型评价**


$$(\mathbf{w}^*,b^*) = \arg\min_{\mathbf{w},b}{\frac{1}{m}\sum_{i=1}^m{(\mathbf{w}^t\mathbf{x}_i+b-y_i)^2}}$$

+ 使用函数train_test_split将数据集划分为训练集(X_train,y_train)和测试集(X_test,y_test)，默认划分比例为3:1
+ 使用训练集(X_train,y_train)学习线性回归模型
+ 分别使用训练集和测试集(X_test,y_test)测试模型性能

In [11]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

X_train, X_test, y_train, y_test = train_test_split(X_extend, y, random_state=0)

print("Shape of train set:", X_train.shape, y_train.shape)
print("Shape of test set:", X_test.shape, y_test.shape)

LR = LinearRegression().fit(X_train, y_train)

print("\nLinear Regression:")
print("\t Train set score: {:.2f}".format(LR.score(X_train, y_train)))
print("\t Test set score: {:.2f}\n".format(LR.score(X_test, y_test)))

Shape of train set: (379, 104) (379, 1)
Shape of test set: (127, 104) (127, 1)

Linear Regression:
	 Train set score: 0.95
	 Test set score: 0.61



#### 岭回归（Ridge Regression）

$$(\mathbf{w}^*,b^*) = \arg\min_{\mathbf{w},b}{\frac{1}{m}\sum_{i=1}^m{(\mathbf{w}^t\mathbf{x}_i+b-y_i)^2} + \alpha\|\mathbf{w}\|_2^2}$$

+ 使用训练集学习岭回归模型
+ 分别使用训练集和测试集(X_test,y_test)测试模型性能

In [19]:
from sklearn.linear_model import Ridge

ridge = Ridge(alpha=100).fit(X_train, y_train)

print("Ridge Linear Regression:")
print("\t Training set score: {:.2f}".format(ridge.score(X_train, y_train)))
print("\t Test set score: {:.2f}\n".format(ridge.score(X_test, y_test)))

Ridge Linear Regression:
	 Training set score: 0.93
	 Test set score: 0.76

