📘 **Note Format Guide**

This format serves as a structured guide for organizing lecture content, personal interpretation, experiments, and study-related questions.

| Type | What It Means | When I Use It |
|------|----------------|----------------|
| 📝 Lecture | Original material from the professor’s notes | When I’m referencing core concepts or provided code |
| 🗣️ In-Class Note | Verbal explanations shared during the lecture | When I want to record something the professor said in class but didn’t include in the official notes |
| ✍️ My Note | My thoughts, interpretations, or additional explanations | When I reflect on or explain something in my own words |
| 🔬 Experiment | Code I tried out or changed to explore further | When I test variations or go beyond the original example |
| ❓ Question | Questions I had while studying | When I want to revisit or research something more deeply |

📝
🗣️
✍️
🔬
❓

# 1. 강의노트 원본 및 영상 링크

[https://guebin.github.io/DL2025/posts/07wk-1.html](https://guebin.github.io/DL2025/posts/07wk-1.html)

- 🗣️
    - softmax: [네모,네모,네모] -> [세모1,세모2,세모3] -> [세모1/세모합,세모2/세모합,세모3,세모합]
    - sig: [네모,0] -exp-> [세모,1] -> [세모/(세모+1), 1/(세모+1)] # [1일 확률, 1이 아닐 확률=0일 확률] 

# 2. Imports 📝

In [1]:
import torch
import torchvision
import matplotlib.pyplot as plt

In [2]:
plt.rcParams['figure.figsize'] = (4.5, 3.0)

# 3. CNN 자랑 📝

## A. 성능좋음

*Fashion MNIST*

In [3]:
train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True)
test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True)
train_dataset = torch.utils.data.Subset(train_dataset, range(5000))
test_dataset = torch.utils.data.Subset(test_dataset, range(1000))
to_tensor = torchvision.transforms.ToTensor()
X = torch.stack([to_tensor(img) for img, lbl in train_dataset]).to("cuda:0")
y = torch.tensor([lbl for img, lbl in train_dataset])
y = torch.nn.functional.one_hot(y).float().to("cuda:0")
XX = torch.stack([to_tensor(img) for img, lbl in test_dataset]).to("cuda:0")
yy = torch.tensor([lbl for img, lbl in test_dataset])
yy = torch.nn.functional.one_hot(yy).float().to("cuda:0")

🗣️(

In [6]:
X.shape # 시간이 오래 걸려서 줄임

torch.Size([5000, 1, 28, 28])

In [7]:
y.shape, XX.shape

(torch.Size([5000, 10]), torch.Size([1000, 1, 28, 28]))

)🗣️

*발악수준으로 설계한 신경망*

In [8]:
torch.manual_seed(0)
net = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784,2048),
    torch.nn.ReLU(),
    torch.nn.Linear(2048,10)
).to("cuda")
loss_fn = torch.nn.CrossEntropyLoss()
optimizr = torch.optim.Adam(net.parameters())

In [9]:
for epoc in range(1,500):
    #1
    logits = net(X)
    #2
    loss = loss_fn(logits, y) 
    #3
    loss.backward()
    #4 
    optimizr.step()
    optimizr.zero_grad()

In [10]:
(net(X).argmax(axis=1) == y.argmax(axis=1)).float().mean()

tensor(1., device='cuda:0')

- 🗣️ 학습으로는 개선될 것이 없음 (오버피팅의 끝) 

In [11]:
(net(XX).argmax(axis=1) == yy.argmax(axis=1)).float().mean()

tensor(0.8530, device='cuda:0')

*대충대충 설계한 합성곱신경망*

In [12]:
torch.manual_seed(0)
net = torch.nn.Sequential(
    torch.nn.Conv2d(1,16,2),
    torch.nn.ReLU(),
    torch.nn.MaxPool2d(2),
    torch.nn.Flatten(),
    torch.nn.Linear(2704,10),
).to("cuda")
loss_fn = torch.nn.CrossEntropyLoss()
optimizr = torch.optim.Adam(net.parameters())

In [13]:
for epoc in range(1,500):
    #1
    logits = net(X)
    #2
    loss = loss_fn(logits, y) 
    #3
    loss.backward()
    #4 
    optimizr.step()
    optimizr.zero_grad()

In [14]:
(net(X).argmax(axis=1) == y.argmax(axis=1)).float().mean()

tensor(0.9666, device='cuda:0')

In [15]:
(net(XX).argmax(axis=1) == yy.argmax(axis=1)).float().mean()

tensor(0.8710, device='cuda:0')

- 🗣️ 오버피팅도 전보다 덜 함, test acc도 개선

🗣️(

- 2704

In [16]:
net

Sequential(
  (0): Conv2d(1, 16, kernel_size=(2, 2), stride=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Flatten(start_dim=1, end_dim=-1)
  (4): Linear(in_features=2704, out_features=10, bias=True)
)

In [17]:
net[:-1]

Sequential(
  (0): Conv2d(1, 16, kernel_size=(2, 2), stride=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Flatten(start_dim=1, end_dim=-1)
)

In [18]:
net[:-1](X)

tensor([[0.4116, 0.4116, 0.4116,  ..., 0.6122, 0.5657, 0.2136],
        [0.4116, 0.4128, 0.4128,  ..., 0.7154, 0.0000, 0.0000],
        [0.4116, 0.4116, 0.4116,  ..., 0.0000, 0.0000, 0.0000],
        ...,
        [0.4116, 0.4116, 0.4116,  ..., 0.0000, 0.0000, 0.0000],
        [0.4116, 0.4116, 0.4116,  ..., 0.0000, 0.0000, 0.0000],
        [0.4116, 0.4116, 0.4116,  ..., 0.0000, 0.0000, 0.0000]],
       device='cuda:0', grad_fn=<ViewBackward0>)

In [19]:
net[:-1](X).shape

torch.Size([5000, 2704])

- 아무거나 써놓고 error 보고 고쳐도 됨
- 참고) GPU error 나면 Kernel 재시작

)🗣️

## B. 파라메터적음

In [20]:
net1 = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784,2048),
    torch.nn.ReLU(),
    torch.nn.Linear(2048,10)
)
net2 = torch.nn.Sequential(
    torch.nn.Conv2d(1,16,2),
    torch.nn.ReLU(),
    torch.nn.MaxPool2d(2),
    torch.nn.Flatten(),
    torch.nn.Linear(2704,10),
)

In [21]:
net1

Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=2048, bias=True)
  (2): ReLU()
  (3): Linear(in_features=2048, out_features=10, bias=True)
)

In [22]:
net1_params = list(net1.parameters())
print(net1_params[0].shape)
print(net1_params[1].shape)
print(net1_params[2].shape)
print(net1_params[3].shape)

torch.Size([2048, 784])
torch.Size([2048])
torch.Size([10, 2048])
torch.Size([10])


In [23]:
2048*784 + 2048 + 10*2048 + 10 

1628170

🗣️(

In [26]:
net1.parameters()

<generator object Module.parameters at 0x7f542f6dbcf0>

- generator: next가 됨 / for문을 돌리기 편한 list 비슷한 형태 

In [27]:
next(net1.parameters())

Parameter containing:
tensor([[-0.0290,  0.0095, -0.0319,  ..., -0.0157, -0.0290, -0.0092],
        [-0.0115, -0.0210, -0.0033,  ..., -0.0106, -0.0276, -0.0034],
        [-0.0270, -0.0204,  0.0075,  ..., -0.0183, -0.0071,  0.0063],
        ...,
        [ 0.0150, -0.0349,  0.0023,  ...,  0.0235,  0.0061,  0.0350],
        [-0.0234, -0.0295, -0.0202,  ..., -0.0353, -0.0169, -0.0149],
        [-0.0315,  0.0171, -0.0010,  ..., -0.0192, -0.0257,  0.0305]],
       requires_grad=True)

In [28]:
list(net1.parameters())

[Parameter containing:
 tensor([[-0.0290,  0.0095, -0.0319,  ..., -0.0157, -0.0290, -0.0092],
         [-0.0115, -0.0210, -0.0033,  ..., -0.0106, -0.0276, -0.0034],
         [-0.0270, -0.0204,  0.0075,  ..., -0.0183, -0.0071,  0.0063],
         ...,
         [ 0.0150, -0.0349,  0.0023,  ...,  0.0235,  0.0061,  0.0350],
         [-0.0234, -0.0295, -0.0202,  ..., -0.0353, -0.0169, -0.0149],
         [-0.0315,  0.0171, -0.0010,  ..., -0.0192, -0.0257,  0.0305]],
        requires_grad=True),
 Parameter containing:
 tensor([ 0.0101, -0.0294,  0.0133,  ...,  0.0234,  0.0283, -0.0351],
        requires_grad=True),
 Parameter containing:
 tensor([[-1.2926e-02, -9.9654e-03, -1.9394e-02,  ..., -4.1905e-05,
          -1.8466e-02, -1.9714e-02],
         [-5.3193e-03,  1.1461e-02,  4.5922e-03,  ..., -2.1761e-02,
           2.1826e-02,  1.1969e-02],
         [-4.7551e-03, -2.1131e-02, -6.7052e-03,  ...,  2.1758e-02,
           9.4742e-03, -6.0024e-04],
         ...,
         [ 8.4402e-03,  3.9834e-0

In [29]:
net1_params = list(net1.parameters())
print(net1_params[0].shape)
print(net1_params[1].shape) # bias
print(net1_params[2].shape) # what
print(net1_params[3].shape) # bias

torch.Size([2048, 784])
torch.Size([2048])
torch.Size([10, 2048])
torch.Size([10])


)🗣️

- 🗣️ net에 parameter가 많다: 비싸다 (net은 GPU에 다 올릴 수 밖에 없음)

In [30]:
net2

Sequential(
  (0): Conv2d(1, 16, kernel_size=(2, 2), stride=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Flatten(start_dim=1, end_dim=-1)
  (4): Linear(in_features=2704, out_features=10, bias=True)
)

In [31]:
net2_params = list(net2.parameters())
print(net2_params[0].shape)
print(net2_params[1].shape)
print(net2_params[2].shape) # what
print(net2_params[3].shape) # bias

torch.Size([16, 1, 2, 2])
torch.Size([16])
torch.Size([10, 2704])
torch.Size([10])


In [32]:
16*1*2*2 + 16 + 10*2704 + 10 

27130

In [33]:
27130/1628170

0.01666287918337766

- 🗣️ 대충 만들었는데 성능도 좋음

🗣️(

In [34]:
net2_params = list(net2.parameters())
print(net2_params[0].shape)

torch.Size([16, 1, 2, 2])


- 차원이 많음

)🗣️

## C. 유명함

`-` <https://brunch.co.kr/@hvnpoet/109> 

🗣️(

- 딥러닝슈퍼스타 -- 힌튼, 르쿤, 벤지오, 응
    - 힌튼 -- DBN(사이언스) ---> 깊은신경망을 만들어도 학습할 수 있다.
        - 관심X
    - 힌튼 대학원생: 알렉스 --> CIFAR10(이미지 데이터)
        - 다른 대학원생: 공모전 제안 -> 나갔음
    - 1. 내 컴퓨터가 너무 느림 --> GPU
      2. 오버피팅 --> 드랍아웃
      3. local min, 기울기소멸, ... (Adam 개발 전) --> 렐루(벤지오 연구실 개발) 사용
    - 1등 <-- 1%만 올려도 대단한데 10%를 올림 (2012년)
    - 2014 <-- Adam

- 요즘은 더 좋은 트랜스포머가 나오긴 함

)🗣️

# 4. CNN 핵심레이어 📝

- 🗣️
    - 합성곱신경망 = 컨볼루셔널 뉴럴 네트워크 = CNN
    - 지금까지 배운 것) (linr -> relu) // (linr -> relu) // ...
    - CNN) (conv -> relu -> mp) // (conv -> relu -> mp) // ...

## A. `torch.nn.ReLU`

**(예시1) 연산방법**

In [35]:
img = torch.randn(1,1,4,4) # (4,4) 흑백이미지 한장
relu = torch.nn.ReLU()

🗣️ (obs, channel, (img size))

In [36]:
img

tensor([[[[ 1.4381,  0.2449, -0.6420,  2.6874],
          [ 0.7790,  1.0558,  0.7939,  0.1099],
          [ 0.3492,  1.7610,  1.6032,  2.4212],
          [ 0.5416, -0.2153, -1.2772,  0.6885]]]])

In [37]:
relu(img)

tensor([[[[1.4381, 0.2449, 0.0000, 2.6874],
          [0.7790, 1.0558, 0.7939, 0.1099],
          [0.3492, 1.7610, 1.6032, 2.4212],
          [0.5416, 0.0000, 0.0000, 0.6885]]]])

## B. `torch.nn.MaxPool2d`

**(예시1) 연산방법, kernel_size 의 의미**

In [38]:
img = torch.rand(1,1,4,4)
mp = torch.nn.MaxPool2d(kernel_size=2)

In [39]:
img

tensor([[[[0.8921, 0.4222, 0.5778, 0.2707],
          [0.6921, 0.5627, 0.5356, 0.1048],
          [0.5356, 0.7699, 0.9047, 0.5911],
          [0.3617, 0.5345, 0.1218, 0.4772]]]])

In [40]:
mp(img)

tensor([[[[0.8921, 0.5778],
          [0.7699, 0.9047]]]])

🗣️ 2*2 window를 만든 뒤 max 값을 적음

**(예시2) 이미지크기와 딱 맞지않는 커널일경우?**

In [41]:
img = torch.rand(1,1,5,5)
mp = torch.nn.MaxPool2d(kernel_size=3)

In [42]:
img

tensor([[[[0.9560, 0.4947, 0.1591, 0.2606, 0.9130],
          [0.0603, 0.1255, 0.6520, 0.2504, 0.8759],
          [0.7544, 0.5927, 0.5319, 0.2390, 0.2883],
          [0.9470, 0.8519, 0.3501, 0.0725, 0.3881],
          [0.7203, 0.0753, 0.8360, 0.1287, 0.9515]]]])

In [43]:
mp(img)

tensor([[[[0.9560]]]])

- 🗣️ version마다 다름
    - pytorch는 나머지를 그냥 버림
    - 나머지 중 max를 적기도 함

**(예시3) 정사각형이 아닌 커널**

In [44]:
img = torch.rand(1,1,4,4)
mp = torch.nn.MaxPool2d(kernel_size=(4,2))

In [45]:
img

tensor([[[[0.4283, 0.9998, 0.3532, 0.3085],
          [0.3278, 0.8575, 0.3331, 0.9769],
          [0.0239, 0.2457, 0.8468, 0.8224],
          [0.9593, 0.1292, 0.5930, 0.3652]]]])

In [46]:
mp(img)

tensor([[[[0.9998, 0.9769]]]])

## C. `torch.nn.Conv2d`

**(예시1) 연산방법, stride=2**

In [47]:
img = torch.rand(1,1,4,4) # (?, in_channels, ?, ?) 
conv = torch.nn.Conv2d(in_channels=1,out_channels=1,kernel_size=2,stride=2) # stride=2: window를 2칸 움직이라는 뜻 (바로 위의 예시와 비슷)

In [48]:
img

tensor([[[[0.7679, 0.3459, 0.6509, 0.7905],
          [0.1166, 0.8762, 0.9373, 0.8573],
          [0.5778, 0.8702, 0.9686, 0.5854],
          [0.1373, 0.3530, 0.0529, 0.0139]]]])

In [49]:
conv(img)

tensor([[[[ 0.1106, -0.1898],
          [ 0.0529, -0.0976]]]], grad_fn=<ConvolutionBackward0>)

??

🗣️ 바로 유추하기 어려움

🗣️(

In [52]:
img[:, :, :2, :2], img

(tensor([[[[0.7679, 0.3459],
           [0.1166, 0.8762]]]]),
 tensor([[[[0.7679, 0.3459, 0.6509, 0.7905],
           [0.1166, 0.8762, 0.9373, 0.8573],
           [0.5778, 0.8702, 0.9686, 0.5854],
           [0.1373, 0.3530, 0.0529, 0.0139]]]]))

In [53]:
img[:, :, :2, 2:], img

(tensor([[[[0.6509, 0.7905],
           [0.9373, 0.8573]]]]),
 tensor([[[[0.7679, 0.3459, 0.6509, 0.7905],
           [0.1166, 0.8762, 0.9373, 0.8573],
           [0.5778, 0.8702, 0.9686, 0.5854],
           [0.1373, 0.3530, 0.0529, 0.0139]]]]))

In [54]:
img[:, :, 2:, :2], img

(tensor([[[[0.5778, 0.8702],
           [0.1373, 0.3530]]]]),
 tensor([[[[0.7679, 0.3459, 0.6509, 0.7905],
           [0.1166, 0.8762, 0.9373, 0.8573],
           [0.5778, 0.8702, 0.9686, 0.5854],
           [0.1373, 0.3530, 0.0529, 0.0139]]]]))

In [55]:
img[:, :, 2:, 2:], img

(tensor([[[[0.9686, 0.5854],
           [0.0529, 0.0139]]]]),
 tensor([[[[0.7679, 0.3459, 0.6509, 0.7905],
           [0.1166, 0.8762, 0.9373, 0.8573],
           [0.5778, 0.8702, 0.9686, 0.5854],
           [0.1373, 0.3530, 0.0529, 0.0139]]]]))

In [57]:
conv(img) # 미분꼬리표 -> parameter

tensor([[[[ 0.1106, -0.1898],
          [ 0.0529, -0.0976]]]], grad_fn=<ConvolutionBackward0>)

In [58]:
conv.weight.data, conv.bias.data

(tensor([[[[-0.0218,  0.2400],
           [-0.4914,  0.3394]]]]),
 tensor([-0.1958]))

In [59]:
img[:, :, :2, :2], img

(tensor([[[[0.7679, 0.3459],
           [0.1166, 0.8762]]]]),
 tensor([[[[0.7679, 0.3459, 0.6509, 0.7905],
           [0.1166, 0.8762, 0.9373, 0.8573],
           [0.5778, 0.8702, 0.9686, 0.5854],
           [0.1373, 0.3530, 0.0529, 0.0139]]]]))

In [60]:
-0.0218 * 0.7679

-0.01674022

In [61]:
img[:, :, :2, :2]*conv.weight.data, img # 행렬 곱이 아니라 원소 별로 곱함 

(tensor([[[[-0.0167,  0.0830],
           [-0.0573,  0.2974]]]]),
 tensor([[[[0.7679, 0.3459, 0.6509, 0.7905],
           [0.1166, 0.8762, 0.9373, 0.8573],
           [0.5778, 0.8702, 0.9686, 0.5854],
           [0.1373, 0.3530, 0.0529, 0.0139]]]]))

In [62]:
(img[:, :, :2, :2]*conv.weight.data).sum()+conv.bias.data, conv(img)

(tensor([0.1106]),
 tensor([[[[ 0.1106, -0.1898],
           [ 0.0529, -0.0976]]]], grad_fn=<ConvolutionBackward0>))

In [64]:
(img[:, :, :2, 2:]*conv.weight.data).sum()+conv.bias.data, conv(img) # 두번째 값

(tensor([-0.1898]),
 tensor([[[[ 0.1106, -0.1898],
           [ 0.0529, -0.0976]]]], grad_fn=<ConvolutionBackward0>))

In [70]:
(img[:, :, 2:, :2]*conv.weight.data).sum()+conv.bias.data, conv(img) # 세번째 값

(tensor([0.0529]),
 tensor([[[[ 0.1106, -0.1898],
           [ 0.0529, -0.0976]]]], grad_fn=<ConvolutionBackward0>))

In [71]:
(img[:, :, 2:, 2:]*conv.weight.data).sum()+conv.bias.data, conv(img) # 네번째 값

(tensor([-0.0976]),
 tensor([[[[ 0.1106, -0.1898],
           [ 0.0529, -0.0976]]]], grad_fn=<ConvolutionBackward0>))

)🗣️

In [7]:
conv.weight.data, conv.bias.data

(tensor([[[[ 0.3095,  0.0207],
           [-0.3130,  0.2836]]]]),
 tensor([-0.2675]))

In [8]:
(img[:,  :,  :2,  :2] * conv.weight.data).sum()+conv.bias.data, conv(img)

(tensor([-0.3077]),
 tensor([[[[-0.3077, -0.4760],
           [ 0.0550, -0.0650]]]], grad_fn=<ConvolutionBackward0>))

In [9]:
(img[:,  :,  :2,  2:] * conv.weight.data).sum()+conv.bias.data, conv(img)

(tensor([-0.4760]),
 tensor([[[[-0.3077, -0.4760],
           [ 0.0550, -0.0650]]]], grad_fn=<ConvolutionBackward0>))

In [11]:
(img[:,  :,  2:,  :2] * conv.weight.data).sum()+conv.bias.data, conv(img)

(tensor([0.0550]),
 tensor([[[[-0.3077, -0.4760],
           [ 0.0550, -0.0650]]]], grad_fn=<ConvolutionBackward0>))

In [12]:
(img[:,  :,  2:,  2:] * conv.weight.data).sum()+conv.bias.data, conv(img)

(tensor([-0.0650]),
 tensor([[[[-0.3077, -0.4760],
           [ 0.0550, -0.0650]]]], grad_fn=<ConvolutionBackward0>))