## 9.5 TensorFlow による線形回帰
カリフォルニアの住宅価格データセットを使って線形回帰問題を解いてみる．  
線形回帰問題は以下の正規方程式を解くことでパラメータを計算できる．  
$$
    \hat{\theta} = ({\bf X}^T \cdot {\bf X})^{-1} \cdot {\bf X}^T \cdot {\bf y}
$$

In [1]:
# カリフォルニアデータセットを使って TensorFlow で線形回帰を試す
import numpy as np
import tensorflow as tf
from sklearn.datasets import fetch_california_housing

# データの読み込み
housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m,1)), housing.data] # バイアス入力特徴量（x_0 = 1）を追加

In [2]:
# 計算グラフの構築
X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

In [3]:
# 線形回帰で求めるパラメータ θ を求める
with tf.Session() as sess:
    theta_value = theta.eval()

In [4]:
theta_value

array([[-3.7185181e+01],
       [ 4.3633747e-01],
       [ 9.3952334e-03],
       [-1.0711310e-01],
       [ 6.4479220e-01],
       [-4.0338000e-06],
       [-3.7813708e-03],
       [-4.2348403e-01],
       [-4.3721911e-01]], dtype=float32)

In [5]:
# scikit-learn でも同じように線形回帰してみる
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit(housing.data, housing.target.reshape(-1, 1))

print(np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T])

[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]


In [6]:
# NumPy でも計算してみる
X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)
theta_numpy = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print(theta_numpy)

[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]
