## Boston Housing Dataset

  * **CRIM**: 자치 시(town) 별 1인당 범죄율
  * **ZN**: 25,000 평방피트를 초과하는 거주지역의 비율
  * **INDUS**: 비소매상업지역이 점유하고 있는 토지의 비율
  * **CHAS**: 찰스강의 경계에 위치해 있으면 1, 그렇지 않으면 0
  * **NOX**: 10ppm당 농축 일산화질소
  * **RM**: 주택 1가구당 평균 방의 개수
  * **AGE**: 1940년 이전에 건축된 소유주택의 비율
  * **DIS**: 5개의 보스턴 직업센터까지의 접근성 지수
  * **RAD**: 방사형 도로까지의 접근성 지수
  * **TAX**: 10,000 달러 당 재산세율
  * **PTRATIO**: 자치 시(town)별 학생/교사 비율
  * **B**: 1000(Bk-0.63)^2 (여기서 Bk는 자치시별 흑인의 비율)
  * **LSTAT**: 모집단의 하위계층 비율(%)
  * **MEDV**: 본인 소유의 주택 가격(중앙값) (단위: $1,000)

### Import Packages

In [1]:
from sklearn.datasets import load_boston
import numpy as np
import pandas as pd

### Load Datasets

In [2]:
boston = load_boston()

In [3]:
boston.keys()

dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])

In [4]:
for key in boston.keys():
    print(key)
    print(boston[key])
    print('\n') # '\n'은 줄바꿈을 의미합니다.

data
[[6.3200e-03 1.8000e+01 2.3100e+00 ... 1.5300e+01 3.9690e+02 4.9800e+00]
 [2.7310e-02 0.0000e+00 7.0700e+00 ... 1.7800e+01 3.9690e+02 9.1400e+00]
 [2.7290e-02 0.0000e+00 7.0700e+00 ... 1.7800e+01 3.9283e+02 4.0300e+00]
 ...
 [6.0760e-02 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9690e+02 5.6400e+00]
 [1.0959e-01 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9345e+02 6.4800e+00]
 [4.7410e-02 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9690e+02 7.8800e+00]]


target
[24.  21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 15.  18.9 21.7 20.4
 18.2 19.9 23.1 17.5 20.2 18.2 13.6 19.6 15.2 14.5 15.6 13.9 16.6 14.8
 18.4 21.  12.7 14.5 13.2 13.1 13.5 18.9 20.  21.  24.7 30.8 34.9 26.6
 25.3 24.7 21.2 19.3 20.  16.6 14.4 19.4 19.7 20.5 25.  23.4 18.9 35.4
 24.7 31.6 23.3 19.6 18.7 16.  22.2 25.  33.  23.5 19.4 22.  17.4 20.9
 24.2 21.7 22.8 23.4 24.1 21.4 20.  20.8 21.2 20.3 28.  23.9 24.8 22.9
 23.9 26.6 22.5 22.2 23.6 28.7 22.6 22.  22.9 25.  20.6 28.4 21.4 38.7
 43.8 33.2 27.5 26.5 18.6 19.3 20.1 19.5 19.5

In [5]:
x = boston["data"]
print(x.shape)

x[0]

(506, 13)


array([6.320e-03, 1.800e+01, 2.310e+00, 0.000e+00, 5.380e-01, 6.575e+00,
       6.520e+01, 4.090e+00, 1.000e+00, 2.960e+02, 1.530e+01, 3.969e+02,
       4.980e+00])

In [6]:
y = boston["target"]

print(y.shape)
y[0:10]

(506,)


array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9])

In [7]:
data = pd.DataFrame(x, columns=boston["feature_names"])

data["MEDV"] = y

print(data.shape)
data.head()

(506, 14)


Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


### Gradient Descent

In [10]:
x1 = x[:, 0] # CRIM
x2 = x[:, 1] # ZN
x3 = x[:, 2] # INDUS
x4 = x[:, 3] # CHAS
x5 = x[:, 4] # NOX
x6 = x[:, 5] # RM
x7 = x[:, 6] # AGE
x8 = x[:, 7] # DIS
x9 = x[:, 8] # RAD
x10 = x[:, 9] # TAX
x11 = x[:, 10] # PTRATIO
x12 = x[:, 11] # B
x13 = x[:, 12] # LSTAT

### Single-layer Perceptron

In [11]:
x_transpose = x.T
w = np.random.uniform(low = -1.0, high = +1.0, size = (13, 1))
b = np.random.uniform(low = -1.0, high = +1.0)

num_epoch = 100000
learning_rate = 0.000006

for epoch in range(num_epoch):
    
    y_predict = np.dot(w.T, x_transpose) + b
    
    error = np.abs(y_predict - y).mean()
    
    if error < 5.0:
        break
    
    if epoch % 10000 == 0:
        print(f"{epoch}, error = {error:.6f}")
    
    w = w - learning_rate * np.dot(x_transpose, (y_predict - y).T) / len(x)
    b = b - learning_rate * (y_predict - y).mean()
    
print("-----" * 10)
print(f"{epoch}, error = {error:.6f}")

0, error = 464.898188
10000, error = 5.665580
20000, error = 5.293655
30000, error = 5.108470
--------------------------------------------------
37734, error = 4.999996
