### Concise Implementation of Softmax Regression

In [None]:
from d2l import torch as d2l
import torch
from torch import nn

Similarly to section "Concise Implementation of Linear Regression", let's use PyTorch to train a softmax regression model.

We will stick with the Fashion-MNIST dataset and keep the batch size at 256 as the previous section.

In [None]:
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

First, let's define our neural net.

In [None]:
net = nn.Sequential(nn.Flatten(), nn.Linear(784, 10))

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights);

Recall the output layer of softmax regression is a fully-connected layer. Therefore, we just need to add one fully-connected layer with 10 outputs to our Sequential. 

we initialize the weights at random with zero mean and standard deviation 0.01. Let's print our neural net and see how it looks like.

Note that PyTorch does not implicitly reshape the inputs. Thus we define the flatten
layer to reshape the inputs before the linear layer in our network

Let's define the loss function and the optimization method.

Rather than building everything from scratch, we will call the built-in loss function `CrossEntropyLoss` from PyTorch.

Note here, instead of passing softmax probabilities into our new loss function, we will pass the logits and compute the softmax and its log all at once inside the cross-entropy loss function, which does smart things like the “LogSumExp trick”.

In [None]:
loss = nn.CrossEntropyLoss()

Similarly, for optimization method, we use minibatch stochastic gradient descent with a learning rate of 0.1 as the optimization algorithm. 

Note that this `SGD` function the same as we applied in the concise version of linear regression example. As you can see, the general applicability of the optimizers is well demonstrated here.

In [None]:
trainer = torch.optim.SGD(net.parameters(), lr=0.1)

Finally, we call the training function defined in the previous lecture to train the model.

In [None]:
num_epochs = 10
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

Even though at this time, we have fewer lines of code than implement everything from scratch, this neural net converges to a solution that achieves a decent accuracy.