### House Price数据预测

**预测某个地区的房屋价格**
+ 数据文件：Boston_Housing.csv
+ 原属性
    - $x_1$：城镇人均犯罪率
    - $x_2$：住宅用地的比例
    - $x_3$：非零售客户业务比例
    - $x_4$：是否靠近Charles River
    - $x_5$：氮氧化物浓度
    - $x_6$：住宅平均房间数
    - $x_7$：自住单位的比例
    - $x_8$：距离5个就业中心的加权距离
    - $x_9$：可用高速公路数
    - $x_{10}$：税率
    - $x_{11}$：学校师生比
    - $x_{12}$：黑人居民比例
    - $x_{13}$：人口密度
+ 预测输出$y$：自有房屋的价值中位数

线性预测：
    $$ y = \mathbf{w}^t \mathbf{x} + b $$
伪逆矩阵求解：
    $$ \left[\begin{matrix}\mathbf{w}^*\\ b \end{matrix}\right] = (X^t X)^{-1}X^t \mathbf{y}$$

In [1]:
import pandas as pd
import numpy as np

data = pd.read_csv("Boston_Housing.csv",header=None,delimiter=r"\s+")

X = data.iloc[:,0:13].to_numpy()
y = data.iloc[:,13].to_numpy().reshape(-1,1)

print("\nShape of X:", X.shape)
print("First two rows of X:\n", X[0:2,])
print("\nFirst ten row of y:\n", y[0:10])


Shape of X: (506, 13)
First two rows of X:
 [[6.3200e-03 1.8000e+01 2.3100e+00 0.0000e+00 5.3800e-01 6.5750e+00
  6.5200e+01 4.0900e+00 1.0000e+00 2.9600e+02 1.5300e+01 3.9690e+02
  4.9800e+00]
 [2.7310e-02 0.0000e+00 7.0700e+00 0.0000e+00 4.6900e-01 6.4210e+00
  7.8900e+01 4.9671e+00 2.0000e+00 2.4200e+02 1.7800e+01 3.9690e+02
  9.1400e+00]]

First ten row of y:
 [[24. ]
 [21.6]
 [34.7]
 [33.4]
 [36.2]
 [28.7]
 [22.9]
 [27.1]
 [16.5]
 [18.9]]


**线性预测**
    $$ y = \mathbf{w}^t \mathbf{x} + b $$
伪逆矩阵求解：
    $$ \left[\begin{matrix}\mathbf{w}^*\\ b \end{matrix}\right] = (X^t X)^{-1}X^t \mathbf{y}$$

In [2]:
from sklearn.linear_model import LinearRegression

LR = LinearRegression()
LR.fit(X,y)
predict_y = LR.predict(X)

print("   y            predict_y\n", np.append(y,predict_y,axis=1))
print("\n Score:", LR.score(X,y))

   y            predict_y
 [[24.         30.00384338]
 [21.6        25.02556238]
 [34.7        30.56759672]
 ...
 [23.9        27.6274261 ]
 [22.         26.12796681]
 [11.9        22.34421229]]

 Score: 0.7406426641094094


**属性扩展**

+ 扩展属性
    - 属性1次项：$x_1,\cdots,x_{13}$
    - 属性2次项：$x_1^2,\cdots,x_{13}^2$
    - 属性2次交叉项：$x_1x_2,x_1x_3,\cdots,x_{12}x_{13}$


In [3]:
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2,include_bias=False)
X_extend = poly.fit_transform(X)

print("Shape of Extened Features:", X_extend.shape)

Shape of Extened Features: (506, 104)


**二次函数预测**
    $$ y = \mathbf{w}^t \mathbf{\tilde{x}} + b = w_1x_1 + w_2x_2 + \cdots + w_{13}x_{13} + w_{14}x_1^2 + \cdots + w_{26}x_{13}^2  + w_{27}x_1x_2 + w_{28}x_1x_3 + \cdots + w_{104}x_{12}x_{13} + b$$
伪逆矩阵求解：
    $$ \left[\begin{matrix}\mathbf{w}^*\\ b \end{matrix}\right] = (\tilde{X}^t \tilde{X})^{-1}\tilde{X}^t \mathbf{y}$$

In [4]:
from sklearn.linear_model import LinearRegression

LR = LinearRegression()
LR.fit(X_extend,y)
predict_y = LR.predict(X_extend)

print("   y           predict_y\n", np.append(y,predict_y,axis=1))
print("\n Score:", LR.score(X_extend,y))

   y           predict_y
 [[24.         24.7918342 ]
 [21.6        22.70683946]
 [34.7        32.63160085]
 ...
 [23.9        22.83472267]
 [22.         20.99475095]
 [11.9        16.03188822]]

 Score: 0.9289961714593017
