# Chap 4. Multi feature linear regression

## 1. Hypothesis
### 1.1. Theory
\begin{equation*}
H({x})=W({x}) + b
\end{equation*}
 
\begin{equation*}
H({x_1, x_2, x_3, ..., x_n})=w_1x_1 + w_2x_2 + w_3x_3 + ... + w_nx_n + b
\end{equation*}

### 1.2. Implementation by Matrix multiplication
\begin{equation*}
[x_1  x_2  x_3  x_n]
\begin{bmatrix}
w_1\\
w_2\\
w_3\\
w_n\\
\end{bmatrix} = x_1w_1 + x_2w_2 + x_3w_3 + x_nw_n
\end{equation*}

\begin{equation*}
H(X) = XW
\end{equation*}
- 입력 데이터가 많아질 경우 수식이 너무 길어지기 때문에 matrix의 곱셈 형태로 수식화가 가능하며, 머신 러닝에 사용하는 샘플 데이터를 모두 matrix에 집어 넣어 쉽게 연산해낼 수 있는 장점이 있다.

### 1.3. 3 Feature(variables x1, x2, x3) * 5 instance(sample number) matrix
\begin{equation*}
\begin{bmatrix}
x_{11} & x_{12} & x_{13} \\
x_{21} & x_{22} & x_{23} \\
x_{31} & x_{32} & x_{33} \\
x_{41} & x_{42} & x_{43} \\
x_{51} & x_{52} & x_{53} \\
\end{bmatrix}
\begin{bmatrix}
w_1\\
w_2\\
w_3\\
\end{bmatrix} =
\begin{bmatrix}
x_{11}w_1 + x_{12}w_2 + x_{13}w_3 \\
x_{21}w_1 + x_{22}w_2 + x_{23}w_3 \\
x_{31}w_1 + x_{32}w_2 + x_{33}w_3 \\
x_{41}w_1 + x_{42}w_2 + x_{43}w_3 \\
x_{51}w_1 + x_{52}w_2 + x_{53}w_3
\end{bmatrix}
\end{equation*}

\begin{equation*}
[5, 3] [3, 1] = [5, 1]
\end{equation*}

\begin{equation*}
X * W = H(X) ==> [n, f] [f, y] = [n, y]
\end{equation*}

- 일반적으로 X는 주어지기 때문에 이미 알고 있다. X는 f개의 feature(variable) 종류와 n개의 intance(샘플 데이터)를 의미 한다.
- 또한 출력값 H(X)도 주어진다. linear regression의 경우 y가 1개이고, logistic regression의 경우 y가 여러개가 될수 있다.
- 그러므로 W(weight)의 크기를 설계할때 X에서 f를 H(X)에서 y를 가져와 적용한다.


## 2. Cost function
\begin{equation*}
cost(W)=\frac{1}{m}\sum_{i=1}^{m}(H({x})^i - y^i)^2
\end{equation*}
 
\begin{equation*}
cost(W)=\frac{1}{m}\sum_{i=1}^{m}(H({x_1}^i,{x_2}^i,{x_3}^i, {x_n}^i) - y^i)^2
\end{equation*}

- 예측하려는 값 H(x1, x2, x3, ..., xn)을 그대로 대입하면 된다.
- 이것을 다시 정리 하면 아래와 같다.

\begin{equation*}
cost(W)=\frac{1}{m}\sum_{i=1}^{m}(WX-y)^2
\end{equation*}

## 3. Minimize Cost (Gradient descent algorithm)
- Feature 수에 상관없이 동일하며, 아래식과 같이 Weight은 현재 Weight에서 cost function을 미분한값에 learning rate를 곱한값을 뺀것과 같다.
\begin{equation*}
W := W - \alpha \frac{\partial}{\partial W}cost(W)
\end{equation*}


## 4. Lab1: Matrix multiplication을 이용한 Multi feature linear regression

In [22]:
################################################################################
# lab4-1 : Multi Variable Linear Regression
# 
################################################################################
import tensorflow as tf

tf.set_random_seed(777) # for reproducibility

x_data = [[73., 80., 75.],
          [93., 88., 93.],
          [89., 91., 90.],
          [96., 98., 100.],
          [73., 66., 70.]]
y_data = [[152.],
          [185.],
          [180.],
          [196.],
          [142.]]          

# Placeholder for a tensor that will be always fed
X = tf.placeholder(tf.float32, shape=[None, 3])
Y = tf.placeholder(tf.float32, shape=[None, 1])

# Model parameters
W = tf.Variable(tf.random_normal([3, 1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

# Our hypothesis
hypothesis = tf.matmul(X, W) + b

# cost/loss function
cost = tf.reduce_mean(tf.square(hypothesis - Y))

# Minimize
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

# launch the graph in a session
init = tf.global_variables_initializer()    # over rev 1.0 api
sess = tf.Session()
sess.run(init)  # reset values to wrong

# training
for step in range(10001):
    cost_val, hy_val, _ = sess.run([cost, hypothesis, train],
            feed_dict={X: x_data, Y: y_data})
    if step % 10000 == 0:
       print(step, "Cost:", cost_val, "\nPrediction:\n", hy_val)
    
sess.close()

0 Cost: 80516.38 
Prediction:
 [[ -93.077995]
 [-122.902214]
 [-115.32393 ]
 [-126.26742 ]
 [ -95.99668 ]]
10000 Cost: 0.40201458 
Prediction:
 [[151.72481]
 [184.42906]
 [180.69597]
 [196.68288]
 [141.18903]]


- 10000회의 학습을 통해 Cost값이 80516에서 0에 가깝게 최소화 되어가는 것을 확인할 수 있다.
- 최종적인 예측 값이, 최초 주어진 y_data와 비슷한것을 확인할 수 있다.

## 5. Lab2: File input multi linear regression with numpy

In [33]:
################################################################################
# lab4-2 : Filie Input Linear Regression
# 
################################################################################
import tensorflow as tf
import numpy as np

tf.set_random_seed(777) # for reproducibility

xy = np.loadtxt('data-01-test-score.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]

# Make sure the shape and data are OK
#print(x_data.shape, x_data, len(x_data))
#print(y_data.shape, y_data, len(y_data))

# Model parameters
W = tf.Variable(tf.random_normal([3, 1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

# Placeholder for a tensor that will be always fed
x = tf.placeholder(tf.float32, shape=[None, 3])
y = tf.placeholder(tf.float32, shape=[None, 1])

# Our hypothesis
linear_model = tf.matmul(x, W) + b

# cost/loss function
cost = tf.reduce_mean(tf.square(linear_model - y))

# Minimize
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

# launch the graph in a session
init = tf.global_variables_initializer()    # over rev 1.0 api
sess = tf.Session()
sess.run(init)  # reset values to wrong

# training
for step in range(10001):
    cost_val, hy_val, _ = sess.run([cost, linear_model, train],
            feed_dict={x: x_data, y: y_data})
    if step % 10000 == 0:
       print(step, "Cost:", cost_val, "\nPrediction:\n", hy_val)

# Ask my score
print("\nYour score will be ", sess.run(linear_model,
            feed_dict={x: [[100, 70, 101]]}))

print("\nOther scores will be ", sess.run(linear_model,
            feed_dict={x: [[60, 70, 65], [90, 100, 80]]}))

sess.close()

0 Cost: 32313.89 
Prediction:
 [[-11.626006  ]
 [-22.355602  ]
 [-17.772346  ]
 [-17.851078  ]
 [-20.505215  ]
 [-14.032096  ]
 [ -8.969039  ]
 [ -0.19958979]
 [-21.132484  ]
 [-17.17147   ]
 [-12.689815  ]
 [-15.717196  ]
 [-19.987759  ]
 [-19.37931   ]
 [-10.117206  ]
 [-20.504623  ]
 [-22.891539  ]
 [ -7.7509565 ]
 [-18.005363  ]
 [-14.989589  ]
 [-12.150096  ]
 [-19.044708  ]
 [-11.306457  ]
 [-15.474425  ]
 [-22.502056  ]]
10000 Cost: 6.0806427 
Prediction:
 [[153.13841]
 [184.39146]
 [181.45366]
 [199.06929]
 [139.37843]
 [104.77288]
 [150.87257]
 [114.50329]
 [174.04066]
 [164.26408]
 [143.9617 ]
 [142.56738]
 [186.05742]
 [152.57445]
 [151.75441]
 [188.43651]
 [143.64365]
 [182.02278]
 [177.09578]
 [158.62404]
 [176.56282]
 [174.29689]
 [167.89342]
 [151.08755]
 [190.43071]]

Your score will be  [[185.46397]]

Other scores will be  [[132.06215]
 [175.28055]]


## 6. Lab3: File input multi linear regression with tensorflow reader
- numpy를 사용할 경우 파일 사이즈에 따라 메모리를 비효율적으로 사용하게 되어 메모리 에러가 발생할수 있다.
- Tensorflow의 queue runners는 이러한 단점을 보완하기 위해 만들어진 library로서 여러개의 파일에서 데이터를 batch size만큼 읽어서 queue에 쌓았다가 꺼내면서 사용한다.

In [32]:
################################################################################
# lab4-3 : Filie Input Linear Regression via tensorflow reader
# 
################################################################################

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

tf.set_random_seed(777) # for reproducibility

filename_queue = tf.train.string_input_producer(
    ['data-01-test-score.csv'], shuffle=False, name='filename_queue')

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# Default values, in case of empty columns.
# Also specifies the type of the decoded result
record_defaults = [[0.], [0.], [0.], [0.]]
xy = tf.decode_csv(value, record_defaults=record_defaults)

# Collect batches of csv in
train_x_batch, train_y_batch = \
    tf.train.batch([xy[0:-1], xy[-1:]], batch_size=10)

# Placeholders for a tensor that will be always fed.
x = tf.placeholder(tf.float32, shape=[None, 3])    
y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([3, 1], name='weight'))
b = tf.Variable(tf.random_normal([1], name='bias'))

# Hypothesis
linear_model = tf.matmul(x, W) + b

# Simplified cost/loss function
cost = tf.reduce_mean(tf.square(linear_model - y))

# Minimize
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)
train = optimizer.minimize(cost)

# Launch the graph in a session
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# Start populating the filename queue
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

# Machine learning
for step in range(10001):
    x_batch, y_batch = sess.run([train_x_batch, train_y_batch])
    cost_val, hy_val, _ = sess.run(
            [cost, linear_model, train], feed_dict={x: x_batch, y: y_batch})
    if step % 10000 == 0:
        print(step, "Cost: ", cost_val, "\nPrediction: ", hy_val)

coord.request_stop()
coord.join(threads)

# ask my score
print("\nYour score will be ", sess.run(linear_model,
    feed_dict={x: [[100, 70, 101]]}))

print("\nOther scores will be ", sess.run(linear_model,
    feed_dict={x: [[60, 70, 65], [90, 100, 80]]}))

sess.close()

0 Cost:  18588.691 
Prediction:  [[18.754368]
 [24.329311]
 [22.763397]
 [26.297508]
 [18.118395]
 [16.608494]
 [21.977345]
 [19.277817]
 [25.36718 ]
 [27.474846]]
10000 Cost:  4.766433 
Prediction:  [[153.57343 ]
 [184.87018 ]
 [181.7545  ]
 [199.05394 ]
 [140.50648 ]
 [106.041916]
 [151.11375 ]
 [114.809906]
 [174.56572 ]
 [164.65022 ]]

Your score will be  [[185.9972]]

Other scores will be  [[132.17192]
 [175.46928]]
