# Implementation of Softmax Regression from Scratch
:label:`chapter_softmax_scratch`

Linear regression을 해본 것처럼 multiclass logistic(softmax)도 해보자  
우선 import부터

In [2]:
import sys
sys.path.insert(0, '..')

%matplotlib inline
import d2l
import torch
from torch.distributions import normal

In [3]:
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw\train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw\train-images-idx3-ubyte.gz to C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw\train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw\train-labels-idx1-ubyte.gz to C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw\t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw\t10k-images-idx3-ubyte.gz to C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz to C:\Users\kyc_o\.pytorch\datasets\fashion-mnist\FashionMNIST\raw



## Initialize Model Parameters
※ 이미지가 28 x 28 px였던 것을 상기합시다.  
우리는 각 example을 $784$차원 벡터로 처리할 수 있다.  
  
dataset에 10개의 카테고리가 있기 때문에, 우리의 망은 10차원의 output을 얻게될 것이다.  
결과적으로 weight는 $784 \times 10$ 행렬이 되고 bias는 $1 \times 10$ 벡터로 구성될 것이다.  
linear regression에서처럼 우리는 weight를 $W$로 초기화하고 Gaussian noise와 bias를 초기 값 $0$으로 설정한다.

In [4]:
num_inputs = 784
num_outputs = 10

W = normal.Normal(loc = 0, scale = 0.01).sample((num_inputs, num_outputs))
b = torch.zeros(num_outputs)

In [5]:
W.requires_grad_(True)
b.requires_grad_(True)

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)

Gradient를 model parameter에 붙여주었다.

## The Softmax
※※※ `torch.sum`이 어떤식으로 동작하는지 review합시다.

In [8]:
X = torch.tensor([[1, 2, 3], [4, 5, 6]])
torch.sum(X, dim = 0, keepdim=True), torch.sum(X, dim = 1, keepdim=True)

(tensor([[5, 7, 9]]),
 tensor([[ 6],
         [15]]))

이제 softmax 함수를 implement할 수 있다.
$$
\mathrm{softmax}(\mathbf{X})_{ij} = \frac{\exp(X_{ij})}{\sum_k \exp(X_{ik})}
$$

In [9]:
def softmax(X):
    X_exp = torch.exp(X)
    partition = torch.sum(X_exp, dim = 1, keepdim = True)
    return X_exp / partition

In [10]:
X = normal.Normal(loc = 0, scale = 1).sample((2, 5))
X_prob = softmax(X)
X_prob, torch.sum(X_prob, dim=1)

(tensor([[0.1246, 0.1006, 0.0395, 0.6286, 0.1068],
         [0.0748, 0.7593, 0.0509, 0.0692, 0.0457]]),
 tensor([1., 1.]))