# Part-2 DNN

## Lab-08-1 Perceptron
- 퍼셉트론(Perceptron)
- 선형분류기(Linear Classifier)
- AND, OR, XOR 게이트

### Neuron
- 인공 신경망은, 인간의 뉴런을 본따 만든 신경망  
- 입력 신호들의 합이 임계값을 넘으면 신호를 출력하는 방식

In [7]:
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

In [8]:
# XOR
X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)

## nn Layers
linear = torch.nn.Linear(2, 1, bias=True)
sigmoid = torch.nn.Sigmoid()

model = torch.nn.Sequential(linear, sigmoid).to(device)

# define cost/loss and optimizer
criterion = torch.nn.BCELoss().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=1)

for step in range(10001):
    optimizer.zero_grad()
    hypothesis = model(X)
    
    cost = criterion(hypothesis, Y)
    cost.backward()
    optimizer.step()
    
    if step%1000==0:
        print(step, cost.item())
    

0 0.7055874466896057
1000 0.6931471824645996
2000 0.6931471824645996
3000 0.6931471824645996
4000 0.6931471824645996
5000 0.6931471824645996
6000 0.6931471824645996
7000 0.6931471824645996
8000 0.6931471824645996
9000 0.6931471824645996
10000 0.6931471824645996


loss가 줄어들지 않음 -> 학습이 제대로 진행되지 않음 (XOR은 퍼셉트론으로 해결 불가능)

## Lab-08-2 Multi Layer Perceptron
- 다중 퍼셉트론(Multi Layer Perceptron)
- 오차역전파(Backpropagation)

다중 레이어를 학습 시킬 수 있는 방법 -> 오차역전파 (loss 값의 그라디언트를 최소화하는 방법)

In [9]:
# backpropagation
X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)

# nn Layers
w1 = torch.Tensor(2, 2).to(device)
b1 = torch.Tensor(2).to(device)
w2 = torch.Tensor(2, 1).to(device)
b2 = torch.Tensor(1).to(device)

def sigmoid(x):
    return 1.0 / (1.0 + torch.exp(-x))

def sigmoid_prime(x):
    return sigmoid(x) * (1-sigmoid(x))

In [12]:
lr = 1

for step in range(10001):
    # forward
    l1 = torch.add(torch.matmul(X, w1), b1)
    a1 = sigmoid(l1)
    l2 = torch.add(torch.matmul(a1, w2), b2)
    Y_pred = sigmoid(l2)
    # BCE
    cost = -torch.mean(Y*torch.log(Y_pred) + (1-Y) * torch.log(1-Y_pred))
    
    # Back prop (chain rule) (backward)
    # loss derivative
    d_Y_pred = (Y_pred-Y) / (Y_pred * (1.0-Y_pred) + 1e-7)
    
    # Layer 2
    d_l2 = d_Y_pred * sigmoid_prime(l2)
    d_b2 = d_l2
    d_w2 = torch.matmul(torch.transpose(a1, 0, 1), d_b2)
    
    # Layer 1
    d_a1 = torch.matmul(d_b2, torch.transpose(w2, 0, 1))
    d_l1 = d_a1 * sigmoid_prime(l1)
    d_b1 = d_l1
    d_w1 = torch.matmul(torch.transpose(X, 0, 1), d_b1)
    
    # weight update (step)
    w1 = w1 - lr*d_w1
    b1 = b1 - lr*torch.mean(d_b1, 0)
    w2 = w2 - lr*d_w2
    b2 = b2 - lr*torch.mean(d_b2, 0)
    
    if step%1000==0:
        print(step, cost.item())

0 0.34671515226364136
1000 0.3467007279396057
2000 0.3466891646385193
3000 0.34667932987213135
4000 0.3466709852218628
5000 0.3466639220714569
6000 0.346657931804657
7000 0.34665271639823914
8000 0.34664785861968994
9000 0.3466435968875885
10000 0.3466399610042572


### Code: xor-nn

In [13]:
X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)
# nn Layers
linear1 = torch.nn.Linear(2, 2, bias=True)
linear2 = torch.nn.Linear(2, 1, bias=True)
sigmoid = torch.nn.Sigmoid()
model = torch.nn.Sequential(linear1, sigmoid, linear2, sigmoid).to(device)
# define cost/loss and optimizer
criterion = torch.nn.BCELoss().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=1)
for step in range(10001):
    optimizer.zero_grad()
    hypothesis = model(X)
    # cost/loss function
    cost = criterion(hypothesis, Y)
    cost.backward()
    optimizer.step()
    
    if step%300==0:
        print(step, cost.item())

0 0.715934157371521
300 0.596375584602356
600 0.3768468201160431
900 0.35934001207351685
1200 0.3544251024723053
1500 0.35218489170074463
1800 0.3509174585342407
2100 0.35010701417922974
2400 0.34954583644866943
2700 0.34913522005081177
3000 0.34882205724716187
3300 0.3485758304595947
3600 0.3483772873878479
3900 0.348213791847229
4200 0.3480769991874695
4500 0.34796082973480225
4800 0.34786105155944824
5100 0.3477745056152344
5400 0.34769824147224426
5700 0.34763121604919434
6000 0.34757161140441895
6300 0.34751802682876587
6600 0.3474700450897217
6900 0.34742674231529236
7200 0.34738701581954956
7500 0.34735098481178284
7800 0.34731805324554443
8100 0.34728747606277466
8400 0.3472594618797302
8700 0.347233384847641
9000 0.3472091555595398
9300 0.3471869230270386
9600 0.3471660315990448
9900 0.34714627265930176


### Code: xor-nn-wide-deep

In [16]:
X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)

# nn Layers
linear1 = torch.nn.Linear(2, 10, bias=True)
linear2 = torch.nn.Linear(10, 10, bias=True)
linear3 = torch.nn.Linear(10, 10, bias=True)
linear4 = torch.nn.Linear(10, 1, bias=True)
sigmoid = torch.nn.Sigmoid()

model = torch.nn.Sequential(linear1, sigmoid, linear2, sigmoid, linear3, sigmoid, linear4, sigmoid).to(device)
# define cost/loss and optimizer
criterion = torch.nn.BCELoss().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=1)

for step in range(10001):
    optimizer.zero_grad()
    hypothesis = model(X)
    # cost/loss function
    cost = criterion(hypothesis, Y)
    cost.backward()
    optimizer.step()
    
    if step%100==0:
        print(step, cost.item())

0 0.7164230346679688
100 0.6931554079055786
200 0.6931540966033936
300 0.6931527853012085
400 0.693151593208313
500 0.6931504607200623
600 0.6931492686271667
700 0.693148136138916
800 0.6931470632553101
900 0.6931459307670593
1000 0.6931447982788086
1100 0.6931437253952026
1200 0.6931427121162415
1300 0.6931416392326355
1400 0.6931405067443848
1500 0.6931394338607788
1600 0.6931382417678833
1700 0.6931371688842773
1800 0.6931359767913818
1900 0.6931347846984863
2000 0.6931334733963013
2100 0.6931322813034058
2200 0.6931309103965759
2300 0.6931295394897461
2400 0.6931281089782715
2500 0.6931265592575073
2600 0.6931249499320984
2700 0.6931232213973999
2800 0.6931214332580566
2900 0.6931195855140686
3000 0.6931174397468567
3100 0.693115234375
3200 0.693112850189209
3300 0.6931103467941284
3400 0.6931076049804688
3500 0.6931045055389404
3600 0.6931012272834778
3700 0.6930975914001465
3800 0.6930936574935913
3900 0.6930893659591675
4000 0.6930843591690063
4100 0.6930789947509766
4200 0.6930

## Lab-09-1 ReLU

## Lab-09-2 Weight initialization

## Lab-09-3 Dropout

## Lab-09-4 Batch Normalization