Постройте модель на основе полносвязных слоёв для классификации Fashion MNIST из библиотеки torchvision.
Получите качество на тестовой выборке не ниже 88%

In [1]:
!pip install torchvision



In [2]:
import numpy as np
import torch
import torchvision as tv
import time

datasets: https://pytorch.org/vision/stable/datasets.html

In [3]:
BATCH_SIZE=256

train_dataset = tv.datasets.FashionMNIST('.', train=True, transform=tv.transforms.ToTensor(), download=True)
test_dataset = tv.datasets.FashionMNIST('.', train=False, transform=tv.transforms.ToTensor(), download=True)
train = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE)
test = torch.utils.data.DataLoader(test_dataset, batch_size=BATCH_SIZE)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to .\FashionMNIST\raw\train-images-idx3-ubyte.gz


100%|█████████████████████████████████████████████████████████████████| 26421880/26421880 [00:02<00:00, 9524816.08it/s]


Extracting .\FashionMNIST\raw\train-images-idx3-ubyte.gz to .\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to .\FashionMNIST\raw\train-labels-idx1-ubyte.gz


100%|████████████████████████████████████████████████████████████████████████| 29515/29515 [00:00<00:00, 704617.10it/s]


Extracting .\FashionMNIST\raw\train-labels-idx1-ubyte.gz to .\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to .\FashionMNIST\raw\t10k-images-idx3-ubyte.gz


100%|███████████████████████████████████████████████████████████████████| 4422102/4422102 [00:00<00:00, 7024934.72it/s]


Extracting .\FashionMNIST\raw\t10k-images-idx3-ubyte.gz to .\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to .\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz


100%|█████████████████████████████████████████████████████████████████████████| 5148/5148 [00:00<00:00, 5163146.10it/s]

Extracting .\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz to .\FashionMNIST\raw






In [4]:
train_dataset.class_to_idx

{'T-shirt/top': 0,
 'Trouser': 1,
 'Pullover': 2,
 'Dress': 3,
 'Coat': 4,
 'Sandal': 5,
 'Shirt': 6,
 'Sneaker': 7,
 'Bag': 8,
 'Ankle boot': 9}

In [5]:
train_dataset.data.shape

torch.Size([60000, 28, 28])

### Базовая модель (SGD)

In [6]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 256),
    torch.nn.ReLU(),
    torch.nn.Linear(256, 10)
)

In [7]:
model

Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=256, bias=True)
  (2): ReLU()
  (3): Linear(in_features=256, out_features=10, bias=True)
)

In [8]:
loss = torch.nn.CrossEntropyLoss()
trainer = torch.optim.SGD(model.parameters(), lr=.01)
num_epochs = 10

In [9]:
def train_model():
    for ep in range(num_epochs):
        train_iters, train_passed  = 0, 0
        train_loss, train_acc = 0., 0.
        start=time.time()
        
        model.train()
        for X, y in train:
            trainer.zero_grad()
            y_pred = model(X)
            l = loss(y_pred, y)
            l.backward()
            trainer.step()
            train_loss += l.item()
            train_acc += (y_pred.argmax(dim=1) == y).sum().item()
            train_iters += 1
            train_passed += len(X)
        
        test_iters, test_passed  = 0, 0
        test_loss, test_acc = 0., 0.
        model.eval()
        for X, y in test:
            y_pred = model(X)
            l = loss(y_pred, y)
            test_loss += l.item()
            test_acc += (y_pred.argmax(dim=1) == y).sum().item()
            test_iters += 1
            test_passed += len(X)
            
        print("ep: {}, taked: {:.3f}, train_loss: {:.3f}, train_acc: {:.3f}, test_loss: {:.3f}, test_acc: {:.3f}"
              .format(ep, time.time() - start, 
                      train_loss / train_iters, 
                      train_acc / train_passed,
                      test_loss / test_iters, 
                      test_acc / test_passed)
             )

In [10]:
train_model()

ep: 0, taked: 7.211, train_loss: 1.737, train_acc: 0.579, test_loss: 1.273, test_acc: 0.655
ep: 1, taked: 6.950, train_loss: 1.064, train_acc: 0.675, test_loss: 0.939, test_acc: 0.680
ep: 2, taked: 7.203, train_loss: 0.859, train_acc: 0.710, test_loss: 0.815, test_acc: 0.713
ep: 3, taked: 6.821, train_loss: 0.767, train_acc: 0.740, test_loss: 0.746, test_acc: 0.737
ep: 4, taked: 6.909, train_loss: 0.709, train_acc: 0.761, test_loss: 0.698, test_acc: 0.756
ep: 5, taked: 7.013, train_loss: 0.667, train_acc: 0.777, test_loss: 0.662, test_acc: 0.770
ep: 6, taked: 6.812, train_loss: 0.635, train_acc: 0.789, test_loss: 0.634, test_acc: 0.780
ep: 7, taked: 7.186, train_loss: 0.609, train_acc: 0.798, test_loss: 0.612, test_acc: 0.787
ep: 8, taked: 7.031, train_loss: 0.588, train_acc: 0.804, test_loss: 0.594, test_acc: 0.795
ep: 9, taked: 7.287, train_loss: 0.570, train_acc: 0.810, test_loss: 0.579, test_acc: 0.801


### Модель (Adam)

In [11]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 256),
    torch.nn.ReLU(),
    torch.nn.Linear(256, 10)
)

In [12]:
trainer = torch.optim.Adam(model.parameters(), lr=.01)
train_model()

ep: 0, taked: 7.775, train_loss: 0.522, train_acc: 0.812, test_loss: 0.435, test_acc: 0.839
ep: 1, taked: 7.405, train_loss: 0.376, train_acc: 0.863, test_loss: 0.425, test_acc: 0.849
ep: 2, taked: 7.554, train_loss: 0.346, train_acc: 0.872, test_loss: 0.403, test_acc: 0.857
ep: 3, taked: 7.785, train_loss: 0.330, train_acc: 0.879, test_loss: 0.381, test_acc: 0.863
ep: 4, taked: 7.347, train_loss: 0.316, train_acc: 0.883, test_loss: 0.389, test_acc: 0.861
ep: 5, taked: 7.439, train_loss: 0.304, train_acc: 0.887, test_loss: 0.406, test_acc: 0.857
ep: 6, taked: 7.519, train_loss: 0.293, train_acc: 0.892, test_loss: 0.429, test_acc: 0.855
ep: 7, taked: 7.667, train_loss: 0.286, train_acc: 0.893, test_loss: 0.392, test_acc: 0.870
ep: 8, taked: 7.179, train_loss: 0.278, train_acc: 0.896, test_loss: 0.419, test_acc: 0.864
ep: 9, taked: 7.402, train_loss: 0.271, train_acc: 0.899, test_loss: 0.394, test_acc: 0.869


### Модель (Adam) + дополнительный слой

In [13]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 512),
    torch.nn.ReLU(),
    torch.nn.Linear(512, 256),
    torch.nn.ReLU(),
    torch.nn.Linear(256, 128),
    torch.nn.ReLU(),
    torch.nn.Linear(128, 10)
)

In [14]:
trainer = torch.optim.Adam(model.parameters(), lr=.01)
train_model()

ep: 0, taked: 8.192, train_loss: 0.598, train_acc: 0.781, test_loss: 0.431, test_acc: 0.842
ep: 1, taked: 8.358, train_loss: 0.394, train_acc: 0.856, test_loss: 0.407, test_acc: 0.856
ep: 2, taked: 8.559, train_loss: 0.367, train_acc: 0.865, test_loss: 0.399, test_acc: 0.859
ep: 3, taked: 9.034, train_loss: 0.341, train_acc: 0.875, test_loss: 0.393, test_acc: 0.860
ep: 4, taked: 9.516, train_loss: 0.329, train_acc: 0.877, test_loss: 0.399, test_acc: 0.860
ep: 5, taked: 9.455, train_loss: 0.315, train_acc: 0.882, test_loss: 0.404, test_acc: 0.865
ep: 6, taked: 10.607, train_loss: 0.308, train_acc: 0.887, test_loss: 0.432, test_acc: 0.857
ep: 7, taked: 9.691, train_loss: 0.308, train_acc: 0.886, test_loss: 0.410, test_acc: 0.860
ep: 8, taked: 9.284, train_loss: 0.295, train_acc: 0.891, test_loss: 0.438, test_acc: 0.862
ep: 9, taked: 9.442, train_loss: 0.292, train_acc: 0.893, test_loss: 0.408, test_acc: 0.866


### Модель (Adam) + дополнительный слой + batch normalization

In [15]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 512),
    torch.nn.ReLU(),
    torch.nn.BatchNorm1d(512),
    torch.nn.Linear(512, 256),
    torch.nn.ReLU(),
    torch.nn.BatchNorm1d(256),
    torch.nn.Linear(256, 128),
    torch.nn.ReLU(),
    torch.nn.BatchNorm1d(128),
    torch.nn.Linear(128, 10)
)

In [16]:
trainer = torch.optim.Adam(model.parameters(), lr=.01)
train_model()

ep: 0, taked: 8.697, train_loss: 0.469, train_acc: 0.827, test_loss: 0.453, test_acc: 0.822
ep: 1, taked: 8.667, train_loss: 0.372, train_acc: 0.862, test_loss: 0.406, test_acc: 0.850
ep: 2, taked: 8.515, train_loss: 0.336, train_acc: 0.876, test_loss: 0.395, test_acc: 0.856
ep: 3, taked: 8.929, train_loss: 0.313, train_acc: 0.883, test_loss: 0.382, test_acc: 0.860
ep: 4, taked: 8.994, train_loss: 0.293, train_acc: 0.891, test_loss: 0.375, test_acc: 0.864
ep: 5, taked: 8.718, train_loss: 0.279, train_acc: 0.897, test_loss: 0.368, test_acc: 0.862
ep: 6, taked: 9.073, train_loss: 0.268, train_acc: 0.901, test_loss: 0.371, test_acc: 0.870
ep: 7, taked: 9.417, train_loss: 0.252, train_acc: 0.906, test_loss: 0.356, test_acc: 0.876
ep: 8, taked: 8.873, train_loss: 0.247, train_acc: 0.907, test_loss: 0.386, test_acc: 0.880
ep: 9, taked: 8.875, train_loss: 0.241, train_acc: 0.910, test_loss: 0.385, test_acc: 0.877


### Модель (RMSprop) + дополнительный слой + batch normalization

In [17]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 512),
    torch.nn.ReLU(),
    torch.nn.BatchNorm1d(512),
    torch.nn.Linear(512, 256),
    torch.nn.ReLU(),
    torch.nn.BatchNorm1d(256),
    torch.nn.Linear(256, 128),
    torch.nn.ReLU(),
    torch.nn.BatchNorm1d(128),
    torch.nn.Linear(128, 10)
)

In [18]:
trainer = torch.optim.RMSprop(model.parameters(), lr=.01)
train_model()

ep: 0, taked: 8.660, train_loss: 0.572, train_acc: 0.796, test_loss: 0.596, test_acc: 0.781
ep: 1, taked: 8.532, train_loss: 0.387, train_acc: 0.856, test_loss: 0.610, test_acc: 0.794
ep: 2, taked: 8.678, train_loss: 0.350, train_acc: 0.871, test_loss: 0.463, test_acc: 0.840
ep: 3, taked: 8.171, train_loss: 0.321, train_acc: 0.881, test_loss: 0.621, test_acc: 0.798
ep: 4, taked: 8.675, train_loss: 0.297, train_acc: 0.888, test_loss: 0.563, test_acc: 0.811
ep: 5, taked: 8.492, train_loss: 0.280, train_acc: 0.895, test_loss: 0.419, test_acc: 0.854
ep: 6, taked: 8.169, train_loss: 0.262, train_acc: 0.903, test_loss: 0.521, test_acc: 0.837
ep: 7, taked: 8.208, train_loss: 0.250, train_acc: 0.906, test_loss: 0.472, test_acc: 0.848
ep: 8, taked: 8.715, train_loss: 0.238, train_acc: 0.910, test_loss: 0.412, test_acc: 0.869
ep: 9, taked: 8.427, train_loss: 0.228, train_acc: 0.914, test_loss: 0.404, test_acc: 0.860


### Модель (Adam) + дополнительный слой + batch normalization + dropout

In [19]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 512),
    torch.nn.BatchNorm1d(512),
    torch.nn.ReLU(),
    torch.nn.Dropout(0.2),
    torch.nn.Linear(512, 256),
    torch.nn.BatchNorm1d(256),
    torch.nn.ReLU(),
    torch.nn.Dropout(0.2),
    torch.nn.Linear(256, 128),
    torch.nn.BatchNorm1d(128),
    torch.nn.ReLU(),
    torch.nn.Linear(128, 10)
)

In [20]:
trainer = torch.optim.Adam(model.parameters(), lr=.01)
train_model()

ep: 0, taked: 8.893, train_loss: 0.499, train_acc: 0.819, test_loss: 0.425, test_acc: 0.843
ep: 1, taked: 9.415, train_loss: 0.375, train_acc: 0.863, test_loss: 0.378, test_acc: 0.858
ep: 2, taked: 8.732, train_loss: 0.334, train_acc: 0.878, test_loss: 0.375, test_acc: 0.858
ep: 3, taked: 8.675, train_loss: 0.310, train_acc: 0.886, test_loss: 0.342, test_acc: 0.873
ep: 4, taked: 8.797, train_loss: 0.289, train_acc: 0.893, test_loss: 0.340, test_acc: 0.875
ep: 5, taked: 8.874, train_loss: 0.274, train_acc: 0.897, test_loss: 0.355, test_acc: 0.867
ep: 6, taked: 8.782, train_loss: 0.260, train_acc: 0.903, test_loss: 0.334, test_acc: 0.879
ep: 7, taked: 8.658, train_loss: 0.247, train_acc: 0.908, test_loss: 0.334, test_acc: 0.880
ep: 8, taked: 9.334, train_loss: 0.237, train_acc: 0.911, test_loss: 0.343, test_acc: 0.878
ep: 9, taked: 8.779, train_loss: 0.229, train_acc: 0.914, test_loss: 0.330, test_acc: 0.885
