### 1 라이브러리에서 AI까지의 계층구조

<img width="634" alt="image1" src="https://user-images.githubusercontent.com/49624407/90575420-4be93300-e1f6-11ea-8cd3-a4adb16ce4c5.png">

### 2 지도학습

1. 과거의 데이터를 준비
2. 모델의 구조를 만듬
3. 데이터로 모델을 학습(FIT)
4. 모델을 이용

### 3 Pandas

In [2]:
import pandas as pd

In [3]:
path_remon = 'https://raw.githubusercontent.com/blackdew/tensorflow1/master/csv/lemonade.csv'

remon = pd.read_csv(path_remon)

In [4]:
path_boston = 'https://raw.githubusercontent.com/blackdew/tensorflow1/master/csv/boston.csv'

boston = pd.read_csv(path_boston)

In [5]:
path_iris = 'https://raw.githubusercontent.com/blackdew/tensorflow1/master/csv/iris.csv'

iris = pd.read_csv(path_iris)

In [6]:
print(remon.shape)
print(boston.shape)
print(iris.shape)

(6, 2)
(506, 14)
(150, 5)


In [7]:
print(remon.columns)
print(boston.columns)
print(iris.columns)

Index(['온도', '판매량'], dtype='object')
Index(['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
       'ptratio', 'b', 'lstat', 'medv'],
      dtype='object')
Index(['꽃잎길이', '꽃잎폭', '꽃받침길이', '꽃받침폭', '품종'], dtype='object')


In [8]:
indep = remon[['온도']]
dep = remon[['판매량']]
print(indep.shape, dep.shape)
 
indep = boston[['crim', 'zn', 'indus', 'chas', 'nox', 
            'rm', 'age', 'dis', 'rad', 'tax',
            'ptratio', 'b', 'lstat']]
dep = boston[['medv']]
print(indep.shape, dep.shape)
 
indep = iris[['꽃잎길이', '꽃잎폭', '꽃받침길이', '꽃받침폭']]
dep = iris[['품종']]
print(indep.shape, dep.shape)

(6, 1) (6, 1)
(506, 13) (506, 1)
(150, 4) (150, 1)


In [9]:
print(remon.head())

   온도  판매량
0  20   40
1  21   42
2  22   44
3  23   46
4  24   48


In [10]:
print(boston.head())

      crim    zn  indus  chas    nox     rm   age     dis  rad  tax  ptratio  \
0  0.00632  18.0   2.31     0  0.538  6.575  65.2  4.0900    1  296     15.3   
1  0.02731   0.0   7.07     0  0.469  6.421  78.9  4.9671    2  242     17.8   
2  0.02729   0.0   7.07     0  0.469  7.185  61.1  4.9671    2  242     17.8   
3  0.03237   0.0   2.18     0  0.458  6.998  45.8  6.0622    3  222     18.7   
4  0.06905   0.0   2.18     0  0.458  7.147  54.2  6.0622    3  222     18.7   

        b  lstat  medv  
0  396.90   4.98  24.0  
1  396.90   9.14  21.6  
2  392.83   4.03  34.7  
3  394.63   2.94  33.4  
4  396.90   5.33  36.2  


In [11]:
print(iris.head())

   꽃잎길이  꽃잎폭  꽃받침길이  꽃받침폭      품종
0   5.1  3.5    1.4   0.2  setosa
1   4.9  3.0    1.4   0.2  setosa
2   4.7  3.2    1.3   0.2  setosa
3   4.6  3.1    1.5   0.2  setosa
4   5.0  3.6    1.4   0.2  setosa


### 4 Deep learning 1 - 레몬 판매 예측

In [12]:
# 라이브러리
import tensorflow as tf
import pandas as pd

In [13]:
# 데이터를 준비
path_remon = 'https://raw.githubusercontent.com/blackdew/tensorflow1/master/csv/lemonade.csv'
remon = pd.read_csv(path_remon)
remon.head()

Unnamed: 0,온도,판매량
0,20,40
1,21,42
2,22,44
3,23,46
4,24,48


In [14]:
# 종속변수, 독립변수
indep = remon[['온도']]
dep = remon[['판매량']]
print(indep.shape, dep.shape)

(6, 1) (6, 1)


In [15]:
# 모델을 만듬

# shape=[1] : 1은 독립변수가 '온도' 하나
X = tf.keras.layers.Input(shape=[1])

# Dense(1) : 1은 종속변수가 '판매량' 하나
Y = tf.keras.layers.Dense(1)(X)

model = tf.keras.models.Model(X, Y)
model.compile(loss='mse')

In [24]:
# 모델을 학습

# epochs=1000 : 1000번을 반복해서 학습
model.fit(indep, dep, epochs=1000, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x14a1fbd90>

In [25]:
# 모델을 이용
print(model.predict([[15]]))

[[30.91824]]


* 손실 Loss  
    (예측-결과)^2의 평균

### 5 Deep learning 2 - 보스턴 집값 예측

**퍼셉트론 Perceptron**

```y = w1x1 + w2x2 + ... + w13x13 + b```

w : 가중치 Weight  
b : 편향 Bias

In [1]:
# 라이브러리
import tensorflow as tf
import pandas as pd

In [2]:
# 과거 데이터 준비
path_boston = 'https://raw.githubusercontent.com/blackdew/tensorflow1/master/csv/boston.csv'
boston = pd.read_csv(path_boston)
print(boston.columns)
boston.head()

Index(['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
       'ptratio', 'b', 'lstat', 'medv'],
      dtype='object')


Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


In [3]:
# 독립변수, 종속변수 분리 
indep_boston = boston[['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
            'ptratio', 'b', 'lstat']]
dep_boston = boston[['medv']]
print(indep_boston.shape, dep_boston.shape)

(506, 13) (506, 1)


In [4]:
# 모델 구조 만듬
X = tf.keras.layers.Input(shape=[13])
Y = tf.keras.layers.Dense(1)(X)
model = tf.keras.models.Model(X, Y)
model.compile(loss='mse')

In [5]:
# 데이터로 모델 학습(FIT)
model.fit(indep_boston, dep_boston, epochs=1000, verbose=0)
model.fit(indep_boston, dep_boston, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x13d279cd0>

In [8]:
# 4. 모델을 이용
print(model.predict(indep_boston[5:10]))

# 종속변수 확인
print(dep_boston[5:10])

[[25.59502 ]
 [20.379963]
 [17.264088]
 [ 8.591943]
 [16.972878]]
   medv
5  28.7
6  22.9
7  27.1
8  16.5
9  18.9


* **5번의 경우 모델이 28.7을 25.59502로 예측함**

In [9]:
# 모델의 수식 확인
print(model.get_weights())

[array([[-0.08703752],
       [ 0.07193476],
       [-0.05149634],
       [ 3.2933924 ],
       [ 2.0239818 ],
       [ 3.8279703 ],
       [ 0.01914695],
       [-0.8023868 ],
       [ 0.15168568],
       [-0.0102657 ],
       [ 0.00665527],
       [ 0.0151636 ],
       [-0.5945045 ]], dtype=float32), array([2.7299166], dtype=float32)]


**수식**  
```medv = -0.08703752 * x1 + 0.07193476 * x2 + ... + -0.5945045 * x13 + 2.7299166```

### 6 Deep learning 3 - 아이리스 품종 분류

**분류**

#### Onehot - endcoding

범주형 데이터를 1,0 데이터로 변환하는 과정

Sigmoid  
Softmax : 비율로 예측하는데 사용 eg) 비가 올 확률 30%

In [10]:
# 라이브러리
import tensorflow as tf
import pandas as pd

In [11]:
# 1.과거의 데이터를 준비
path_iris = 'https://raw.githubusercontent.com/blackdew/tensorflow1/master/csv/iris.csv'
iris = pd.read_csv(path_iris)
iris.head()

Unnamed: 0,꽃잎길이,꽃잎폭,꽃받침길이,꽃받침폭,품종
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [13]:
# 원핫인코딩
iris = pd.get_dummies(iris)
iris.head()

Unnamed: 0,꽃잎길이,꽃잎폭,꽃받침길이,꽃받침폭,품종_setosa,품종_versicolor,품종_virginica
0,5.1,3.5,1.4,0.2,1,0,0
1,4.9,3.0,1.4,0.2,1,0,0
2,4.7,3.2,1.3,0.2,1,0,0
3,4.6,3.1,1.5,0.2,1,0,0
4,5.0,3.6,1.4,0.2,1,0,0


In [15]:
print(iris.columns)

Index(['꽃잎길이', '꽃잎폭', '꽃받침길이', '꽃받침폭', '품종_setosa', '품종_versicolor',
       '품종_virginica'],
      dtype='object')


In [16]:
# 종속변수, 독립변수
indep_iris = iris[['꽃잎길이', '꽃잎폭', '꽃받침길이', '꽃받침폭']]
dep_iris = iris[['품종_setosa', '품종_versicolor', '품종_virginica']]
print(indep_iris.shape, dep_iris.shape)

(150, 4) (150, 3)


In [19]:
# 2. 모델의 구조를 만듬

# 독립변수 4개
X = tf.keras.layers.Input(shape=[4])

# 종속변수 3개, 활성화함수 softmax
Y = tf.keras.layers.Dense(3, activation='softmax')(X)

model = tf.keras.models.Model(X, Y)

# 분류에 사용하는 loss : categorical_crossentropy, 모델학습 시 accuracy도 출력
model.compile(loss='categorical_crossentropy',
              metrics='accuracy')

* 회귀에 사용하는 loss : mse  
```model.compile(loss='mse')```

<img width="1321" alt="스크린샷 2020-08-24 오전 5 19 47" src="https://user-images.githubusercontent.com/49624407/90987914-ab19bf80-e5c9-11ea-823b-5e49f5c61ba1.png">

회귀에서는 종속변수의 범위가 무한대

두 번째 케이스의 경우 setosa일 확률이 70%, virginica일 확률이 30%, versicolor일 확률이 0% 임을 나타냄  
**즉, 분류에서는 종속변수의 범위가 0과 1사이 => softmax함수를 사용한다.**

<img width="1169" alt="스크린샷 2020-08-24 오전 5 20 11" src="https://user-images.githubusercontent.com/49624407/90987999-59be0000-e5ca-11ea-9811-fc656dffb4f9.png">

실제로는 퍼셉트론에서 함수가 사용되는데, 위의 회귀모델에서는 Identity함수가 사용되었던 것이고  
이번 분류모델에서는 Softmax함수가 사용됨

**이런 함수들을 활성화함수 Activation이라고 한다.**

In [20]:
# 데이터로 모델을 학습(FIT)
model.fit(indep_iris, dep_iris, epochs=1000, verbose=0)
model.fit(indep_iris, dep_iris, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x13d4f9550>

In [22]:
# 4. 모델을 이용

# 맨 처음 데이터 5개
print(model.predict(indep_iris[:5]))
print(dep_iris[:5])

[[9.9919564e-01 8.0427743e-04 7.4214711e-08]
 [9.9640852e-01 3.5907221e-03 6.8859299e-07]
 [9.9848002e-01 1.5197721e-03 2.6430098e-07]
 [9.9612290e-01 3.8759054e-03 1.1601403e-06]
 [9.9938834e-01 6.1163493e-04 5.8230810e-08]]
   품종_setosa  품종_versicolor  품종_virginica
0          1              0             0
1          1              0             0
2          1              0             0
3          1              0             0
4          1              0             0


In [23]:
# 맨 마지막 데이터 5개
print(model.predict(indep_iris[-5:]))
print(dep_iris[-5:])

[[6.9470053e-07 1.3295156e-01 8.6704773e-01]
 [1.4940694e-06 2.1796234e-01 7.8203619e-01]
 [2.8624149e-06 2.3466112e-01 7.6533604e-01]
 [6.5233371e-07 7.9616256e-02 9.2038310e-01]
 [6.4747164e-06 2.4015704e-01 7.5983644e-01]]
     품종_setosa  품종_versicolor  품종_virginica
145          0              0             1
146          0              0             1
147          0              0             1
148          0              0             1
149          0              0             1


In [24]:
# weights & bias 출력
print(model.get_weights())

[array([[ 0.31274942,  0.12852718, -1.1398021 ],
       [ 3.7125852 ,  0.78835934, -0.16735694],
       [-3.4145813 , -0.2758899 ,  1.3270314 ],
       [-4.5348315 , -1.5860738 ,  1.5692662 ]], dtype=float32), array([ 1.9464889,  1.0121317, -1.3402766], dtype=float32)]


* 품종 setosa를 분류하는 수식  
``` y1 = softmax(0.31274942 * x1 + 3.7125852 * x2 + -3.4145813 * x3 + -4.5348315 * x4 + 1.9464889) ```

### 7 Deep learning 4 - 신경망의 완성: 히든레이어

<img width="613" alt="스크린샷 2020-08-24 오전 6 07 19" src="https://user-images.githubusercontent.com/49624407/90988748-21b9bb80-e5d0-11ea-8c95-1ab551268a6f.png">

In [26]:
# 라이브러리
import tensorflow as tf
import pandas as pd

In [31]:
# 1. 과거 데이터 준비
path_boston = 'https://raw.githubusercontent.com/blackdew/tensorflow1/master/csv/boston.csv'
boston = pd.read_csv(path_boston)
print(boston.columns)
boston.head()

Index(['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
       'ptratio', 'b', 'lstat', 'medv'],
      dtype='object')


Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


In [28]:
# 독립변수, 종속변수 분리 
indep_boston = boston[['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
            'ptratio', 'b', 'lstat']]
dep_boston = boston[['medv']]
print(indep_boston.shape, dep_boston.shape)

(506, 13) (506, 1)


In [29]:
# 2. 모델의 구조를 만듬
X = tf.keras.layers.Input(shape=[13])

# hidden layer 생성, 활성화 함수 : swish
H = tf.keras.layers.Dense(10, activation='swish')(X)
Y = tf.keras.layers.Dense(1)(H)
model = tf.keras.models.Model(X, Y)
model.compile(loss='mse')

In [30]:
# 모델 구조 확인
model.summary()

Model: "functional_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         [(None, 13)]              0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                140       
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 11        
Total params: 151
Trainable params: 151
Non-trainable params: 0
_________________________________________________________________


* Param : 가중치  

Parm 140
> InputLayer에서 변수 13개 => w1, w2, ... w13  
> Bias 1개 => b  
> HiddenLayer에서 변수 10개 => h1, h2, ... h10  
>  
> **14 * 10 = 140**
      

In [34]:
# 데이터로 모델 학습(FIT)
model.fit(indep_boston, dep_boston, epochs=1000, verbose=0)
model.fit(indep_boston, dep_boston, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x13d6e4b50>

In [35]:
# 4. 모델을 이용
print(model.predict(indep_boston[5:10]))

# 종속변수 확인
print(dep_boston[5:10])

[[27.779905]
 [24.813261]
 [21.803938]
 [13.679838]
 [21.217634]]
   medv
5  28.7
6  22.9
7  27.1
8  16.5
9  18.9


In [37]:
# 모델의 수식 확인
print(model.get_weights())

[array([[-0.2263746 ,  1.7714092 , -0.3354338 , -0.43518877,  0.08903422,
         0.09096034,  0.03854512, -0.15376803, -0.4506066 , -0.28057966],
       [-0.10413948,  0.06881601, -0.23484224, -0.32599705, -0.22395189,
         0.46295482,  0.01958755, -0.22187424, -0.29141697, -0.10684466],
       [-0.46266714, -0.22097953, -0.00386952, -0.30593503, -0.04044988,
         0.6765402 ,  0.11271285, -0.33109546, -0.3786279 ,  0.11234149],
       [ 0.16216528,  0.89917874, -1.785193  ,  0.02710807, -1.2442421 ,
        -1.5206581 , -1.5542879 ,  0.29113787,  0.2771349 ,  0.98874766],
       [-0.17270207,  0.03965821, -0.1246333 , -0.02260378, -0.11973839,
         0.2376907 ,  0.23079792,  0.39814436, -0.40520114,  0.31922466],
       [ 0.25286442, -0.2193042 , -2.3916132 , -0.2820711 , -2.6174226 ,
        -1.9777483 , -2.904242  , -0.35235596,  0.23925102,  2.6647239 ],
       [ 0.21816891,  0.01247635,  0.08751102,  0.01106358, -0.35128462,
         0.03053562,  0.06235793, -0.0745481

### 8 데이터를 위한 팁

In [38]:
# 라이브러리 사용
import pandas as pd

In [39]:
# 파일 읽어오기
path_iris_2 = 'https://raw.githubusercontent.com/blackdew/tensorflow1/master/csv/iris2.csv'
iris_2 = pd.read_csv(path_iris_2)
iris_2.head()

Unnamed: 0,꽃잎길이,꽃잎폭,꽃받침길이,꽃받침폭,품종
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [40]:
# 칼럼의 데이터 타입 체크
print(iris_2.dtypes)

꽃잎길이     float64
꽃잎폭      float64
꽃받침길이    float64
꽃받침폭     float64
품종         int64
dtype: object


In [43]:
# 원핫인코딩 되지 않는 현상 확인 : int형은 원핫인코딩 불가능
iris_2 = pd.get_dummies(iris_2)
iris_2.head()

Unnamed: 0,꽃잎길이,꽃잎폭,꽃받침길이,꽃받침폭,품종
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [44]:
# 품종 타입을 범주형으로 바꾸어 준다. 
iris_2['품종'] = iris_2['품종'].astype('category')
print(iris_2.dtypes)

꽃잎길이      float64
꽃잎폭       float64
꽃받침길이     float64
꽃받침폭      float64
품종       category
dtype: object


In [47]:
# 원핫인코딩 : category, object형 원핫인코딩 가능
iris_2 = pd.get_dummies(iris_2)
iris_2.head()

Unnamed: 0,꽃잎길이,꽃잎폭,꽃받침길이,꽃받침폭,품종_0,품종_1,품종_2
0,5.1,3.5,1.4,0.2,1,0,0
1,4.9,3.0,1.4,0.2,1,0,0
2,4.7,3.2,1.3,0.2,1,0,0
3,4.6,3.1,1.5,0.2,1,0,0
4,5.0,3.6,1.4,0.2,1,0,0


In [50]:
# NA값을 체크
iris_2.isna().sum()

꽃잎길이     0
꽃잎폭      1
꽃받침길이    0
꽃받침폭     0
품종_0     0
품종_1     0
품종_2     0
dtype: int64

In [51]:
iris_2.tail()

Unnamed: 0,꽃잎길이,꽃잎폭,꽃받침길이,꽃받침폭,품종_0,품종_1,품종_2
145,6.7,3.0,5.2,2.3,0,0,1
146,6.3,2.5,5.0,1.9,0,0,1
147,6.5,3.0,5.2,2.0,0,0,1
148,6.2,3.4,5.4,2.3,0,0,1
149,5.9,,5.1,1.8,0,0,1


In [52]:
# NA값에 꽃잎폭 평균값을 insert
mean = iris_2['꽃잎폭'].mean()
print(mean)
iris_2['꽃잎폭'] = iris_2['꽃잎폭'].fillna(mean)
iris_2.tail()

3.0543624161073826


Unnamed: 0,꽃잎길이,꽃잎폭,꽃받침길이,꽃받침폭,품종_0,품종_1,품종_2
145,6.7,3.0,5.2,2.3,0,0,1
146,6.3,2.5,5.0,1.9,0,0,1
147,6.5,3.0,5.2,2.0,0,0,1
148,6.2,3.4,5.4,2.3,0,0,1
149,5.9,3.054362,5.1,1.8,0,0,1


### 9 모델을 위한 팁

In [53]:
# 기존에 사용하던 모델의 구조
X = tf.keras.layers.Input(shape=[4])
H = tf.keras.layers.Dense(8, activation='swish')(X)
H = tf.keras.layers.Dense(8, activation='swish')(H)
H = tf.keras.layers.Dense(8, activation='swish')(H)
Y = tf.keras.layers.Dense(3, activation='softmax')(H)
model = tf.keras.models.Model(X, Y)
model.compile(loss='categorical_crossentropy',
              metrics='accuracy')

In [54]:
# 모델의 구조를 BatchNormalization layer를 사용하여 만든다.
X = tf.keras.layers.Input(shape=[4])
 
H = tf.keras.layers.Dense(8)(X)
H = tf.keras.layers.BatchNormalization()(H)
H = tf.keras.layers.Activation('swish')(H)
 
H = tf.keras.layers.Dense(8)(H)
H = tf.keras.layers.BatchNormalization()(H)
H = tf.keras.layers.Activation('swish')(H)
 
H = tf.keras.layers.Dense(8)(H)
H = tf.keras.layers.BatchNormalization()(H)
H = tf.keras.layers.Activation('swish')(H)
 
Y = tf.keras.layers.Dense(3, activation='softmax')(H)
model = tf.keras.models.Model(X, Y)
model.compile(loss='categorical_crossentropy',
              metrics='accuracy')