From Switch Transformer paper:

>In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example. The result is a sparsely-activated model -- with outrageous numbers of parameters -- but a constant computational cost.

A vanilla Transformer block looks like this:

```python
class ModernTransformerBlock(nn.Module):
    def __init__(self, embed_dim, n_heads, up):
        super().__init__()
        self.attn = nn.MultiheadAttention(embed_dim, n_heads)
        self.mlp = nn.Sequential(
            SwishGLU(embed_dim, embed_dim * up),
            nn.Linear(embed_dim * up, embed_dim),
        )
        self.pre_attn_norm = RMSNorm(embed_dim)
        self.pre_mlp_norm = RMSNorm(embed_dim)
    
    def forward(self, x):
        x = x + self.attn(self.pre_attn_norm(x))
        x = x + self.mlp(self.pre_mlp_norm(x))
        return x
```

The Mixture-of-Experts layer replaces the MLP layer. Instead of having one MLP layer, we have `num_experts` different MLP layers called *experts*.

The idea is to process a contextualized token, by sending it to a subset of experts. In this way we could efficiently increase the number of parameters of the model without affecting computational cost too much.

First, the token is fed into *router*, which determines to which experts a token should go to be processed. For computational reasons, there is a fixed limit on:
* how many tokens an expert can process, and
* by how many experts a token is processed.

# Grading
Your task is to implement a Mixture of Experts layer. You can get points for the following subtasks:
1.  (5 points) Naive implementation of MoE layer that works with `num_experts_per_token>=1`
2.  (5 points) Well-vectorized implementation of MoE layer that works with `num_experts_per_token=1`
3.  (5 points) Implementation of a script testing for 1. 2. implementations output equivalence and performance superiority of 2.
4.  (5 points) Well-vectorized implementation of MoE layer that works with `num_experts_per_token>=1`
5.  (Bonus 5 points) Use Huggingface's Trainer class and compare performance of randomly initialized MoE Transformer and standard Transformer on `https://huggingface.co/datasets/imdb` dataset.

20 points scored in this task is equivalent to at least 16% points achievable in this course.

Please submit your assignments until 15th of April, 18:00 CET.

# Rules
- You shouldn't change basic `forward` and `initialization` signatures of the main classes: `Router` and `MoE`. You can add additional arguments with default values.
- As an assignment, provide a Jupyter notebook with a short introduction at the top of what has been done and where.
- You can add or remove any other classes, though you should keep the behaviour of `MLP` class somehow.
- Sensible vectorization is good enough for the maximum amount of points. There is no need to optimize performance to the max, just show that you can identify opportunities for vectorization and you are able to implement complex vectorizations.
- If in doubt, direct questions to either Jan Ludziejewski or Juliusz Straszyński.
- A notebook that is hard to grade (crashing, obfuscated) might be scored for 0 points.

# Hints
- First, write a naive implementation, vectorized operations might be hard to analyze for correctness.
- You can make randomness deterministic by appropriate torch functions.
- If you have a hard time fulfilling fair randomness for token discarding, you can try keeping the earlier tokens.

In [1]:
%pip install torch_tb_profiler einops

Collecting torch_tb_profiler
  Downloading torch_tb_profiler-0.4.3-py3-none-any.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting einops
  Downloading einops-0.7.0-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: einops, torch_tb_profiler
Successfully installed einops-0.7.0 torch_tb_profiler-0.4.3


In [2]:
from torch import nn
import torch
from transformers import PretrainedConfig
import torch.nn.functional as F
from einops import einsum

class MLP(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.mlp = nn.Sequential(
            nn.Linear(config.hidden_size, config.intermediate_size),
            nn.ReLU(),
            nn.Linear(config.intermediate_size, config.hidden_size),
        )

    def forward(self, x):
        return self.mlp(x)

# Router
The router is a module which assigns tokens to experts. It answers two questions:
1. Which tokens should be assigned to which expert.
2. How much weight should be assigned to each expert. The weight is determined by similarity between the token embedding and the expert embedding

The following conditions must be satisfied:
1. The routing weights must sum to 1 for each token and be non-negative
2. A token should have exactly `num_experts_per_token` non-zero weights

In [3]:
# Input: [batch_size, seq_len, hidden_size] - input embeddings
# Output: [batch_size, seq_len, num_experts] - expert routing weights
class Router(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.num_experts_per_token = config.num_experts_per_token
        self.hidden_size = config.hidden_size
        self.num_experts = config.num_experts

        self.expert_embeddings = nn.Parameter(torch.randn(self.num_experts, self.hidden_size))
        torch.nn.init.kaiming_uniform_(self.expert_embeddings, nonlinearity='linear')

    def forward(self, x):
        pass

The MoE module is a module which wraps around a set of expert modules and a router module.

It takes input embeddings and routes them to the experts.

Each token is processed individually by a subset of experts.

The output token embedding is a weighted sum of the expert outputs.

The weights are determined by the router module.

The subset of experts is determined by non-zero weights in the routing output.

Additionally each expert might process at most `expert_capacity = ceil((batch_size * seq_len) / num_experts * capacity_factor)` tokens

Superfluous tokens to be discarded by a particular expert should be selected uniformly at random.

Discarding should be equivalent to setting the appropriate routing weight to 0, while other weights remain the same.

This means that a token is processed by at most num_experts_per_token experts with a sum of weights of at most 1.

Specifically, this could mean that a token is processed by 0 experts - in this case the resulting embedding should be a zero tensor.

In [4]:
import math

# Input: [batch_size, seq_len, hidden_size] - input embeddings
# Output: [batch_size, seq_len, hidden_size] - output embeddings
class MoE(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.num_experts = config.num_experts
        self.hidden_size = config.hidden_size
        self.num_experts_per_token = config.num_experts_per_token
        self.capacity_factor = config.capacity_factor

        # You can change experts representation if you want
        self.experts = nn.ModuleList([MLP(config) for _ in range(self.num_experts)])
        self.router = Router(config)

    def forward(self, x):
        batch_size, seq_len, hidden_size = x.shape
        expert_capacity = math.ceil(batch_size * seq_len / self.num_experts * self.capacity_factor)
        pass

# Configurations

In [5]:
base_config = dict(
    vocab_size=5000,
    max_position_embeddings=256,
    num_attention_heads=8,
    num_hidden_layers=4,
    hidden_dropout_prob=0.1,
    hidden_size=128,
    intermediate_size=512,
    num_labels=2
)

standard_config = PretrainedConfig(
    **base_config,
    ff_cls=MLP
)

moe_config = PretrainedConfig(
    **base_config,
    num_experts=4,
    capacity_factor=2.0,
    num_experts_per_token=1,
    ff_cls=MoE
)

# Basic Transformer-related classes

In [6]:
from einops import rearrange

class Embedding(nn.Module):
  def __init__(self, config):
    super(Embedding, self).__init__()
    self.word_embed = nn.Embedding(config.vocab_size, config.hidden_size)
    self.pos_embed = nn.Embedding(config.max_position_embeddings, config.hidden_size)
    self.dropout = nn.Dropout(config.hidden_dropout_prob)

  def forward(self, x):
    batch_size, seq_length = x.shape
    device = x.device
    positions = torch.arange(0, seq_length).expand(
        batch_size, seq_length).to(device)
    embedding = self.word_embed(x) + self.pos_embed(positions)
    return self.dropout(embedding)


class MHSelfAttention(nn.Module):
    def __init__(self, config: PretrainedConfig):
        super(MHSelfAttention, self).__init__()
        self.num_attention_heads = config.num_attention_heads
        self.hidden_size = config.hidden_size
        self.head_size = self.hidden_size // self.num_attention_heads
        self.num_attention_heads = config.num_attention_heads
        self.qkv = nn.Linear(self.hidden_size, 3 * self.hidden_size, bias=False)

    def forward(self, embeddings):
        batch_size, seq_length, hidden_size = embeddings.size()

        result = self.qkv(embeddings)
        q, k, v = rearrange(result, 'b s (qkv nah hdsz) -> qkv b nah s hdsz', nah=self.num_attention_heads, qkv=3).unbind(0)

        attention_scores = torch.matmul(q, k.transpose(-1, -2))
        attention_scores = attention_scores / math.sqrt(hidden_size)
        attention_probs = nn.Softmax(dim=-1)(attention_scores)

        contextualized_layer = torch.matmul(attention_probs, v)

        outputs = rearrange(contextualized_layer, 'b nah s hdsz -> b s (nah hdsz)')
        return outputs

class TransformerBlock(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.attention = MHSelfAttention(config)
        self.norm1 = nn.LayerNorm(config.hidden_size)
        self.norm2 = nn.LayerNorm(config.hidden_size)
        self.intermediate = config.ff_cls(config)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)

    def forward(self, x):
        x =  x + self.norm1(self.dropout(self.attention(x)))
        x =  x + self.norm2(self.dropout(self.intermediate(x)))
        return x

class TransformerClassifier(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.embeddings = Embedding(config)
        self.layer = nn.Sequential(*[TransformerBlock(config) for _ in range(config.num_hidden_layers)])
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    def forward(self, input_ids, labels=None):
        embedding_output = self.embeddings(input_ids)
        encoding = self.layer(embedding_output)
        pooled_encoding = encoding.mean(dim=1)
        logits = self.classifier(pooled_encoding)
        loss = F.cross_entropy(logits, labels) if labels is not None else None
        return {
            'loss': loss,
            'logits': logits,
        }

# Tokenizer training

In [7]:
%pip install datasets

Collecting datasets
  Downloading datasets-2.18.0-py3-none-any.whl (510 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m19.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m15.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: xxhash, dill, multiprocess, datasets
Successfully installed datasets

In [8]:
from tokenizers import ByteLevelBPETokenizer
from datasets import load_dataset
from tokenizers.processors import BertProcessing

dataset = load_dataset('imdb')

tokenizer = ByteLevelBPETokenizer()
tokenizer.train_from_iterator(
    dataset['train']['text'],
    vocab_size=base_config['vocab_size'],
    special_tokens=["<s>", "</s>", "<pad>"],
    min_frequency=2
)
tokenizer.post_processor = BertProcessing(
    ("</s>", tokenizer.token_to_id("</s>")),
    ("<s>", tokenizer.token_to_id("<s>")),
)

tokenizer.enable_truncation(max_length=base_config['max_position_embeddings'])
tokenizer.enable_padding(pad_id=tokenizer.token_to_id("<pad>"), pad_token="<pad>", length=base_config['max_position_embeddings'])
tokenizer.model_max_length = base_config['max_position_embeddings']
tokenizer.pad_token = "<pad>"

from transformers import Trainer, TrainingArguments

def tokenize(row):
    return {
        'input_ids': tokenizer.encode(row['text']).ids,
    }

tokenized_dataset = dataset.map(tokenize)

Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

# **1. Naive implementation of MoE layer that works with num_experts_per_token>=1**

In [35]:
# Input: [batch_size, seq_len, hidden_size] - input embeddings
# Output: [batch_size, seq_len, num_experts] - expert routing weights
class Router(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.num_experts_per_token = config.num_experts_per_token
        self.hidden_size = config.hidden_size
        self.num_experts = config.num_experts

        self.expert_embeddings = nn.Parameter(torch.randn(self.num_experts, self.hidden_size))
        torch.nn.init.kaiming_uniform_(self.expert_embeddings, nonlinearity='linear')

    def forward(self, x):
        batch_size, seq_len, hidden_size = x.shape
        similarity = einsum(x, self.expert_embeddings, 'b s h, e h -> b s e')
        top_experts = torch.topk(similarity, self.num_experts_per_token)
        softmaxed_topk_values = F.softmax(top_experts.values, dim=-1)
        mask = torch.zeros_like(similarity, dtype=torch.bool)
        mask = mask.scatter_(-1, top_experts.indices, 1)
        routing_weights = torch.zeros_like(similarity)
        routing_weights[mask] = softmaxed_topk_values.flatten()

        return routing_weights

In [36]:
# Input: [batch_size, seq_len, hidden_size] - input embeddings
# Output: [batch_size, seq_len, hidden_size] - output embeddings
class NaiveMoE(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.num_experts = config.num_experts
        self.hidden_size = config.hidden_size
        self.num_experts_per_token = config.num_experts_per_token
        self.capacity_factor = config.capacity_factor

        # You can change experts representation if you want
        self.experts = nn.ModuleList([MLP(config) for _ in range(self.num_experts)])
        self.router = Router(config)

    def forward(self, x):
        batch_size, seq_len, hidden_size = x.shape
        expert_capacity = torch.ceil(torch.tensor(batch_size * seq_len / self.num_experts * self.capacity_factor, device=x.device, dtype=torch.int))
        routing_weights = self.router(x)
        for i in range(self.num_experts):
            token_indices = torch.nonzero(routing_weights[:, :, i], as_tuple=False)
            if token_indices.shape[0] > expert_capacity:
                routing_weights[token_indices[expert_capacity:, 0], token_indices[expert_capacity:, 1], i] = 0

        expert_outputs = torch.zeros(batch_size, seq_len, self.hidden_size, device=x.device)
        for i in range(self.num_experts):
            expert_indices = torch.nonzero(routing_weights[:, :, i], as_tuple=False)
            expert_outputs[expert_indices[:, 0], expert_indices[:, 1]] = self.experts[i](x[expert_indices[:, 0], expert_indices[:, 1]])

        return expert_outputs

In [37]:
from torch.utils.data import DataLoader

naive_moe_config = PretrainedConfig(
    **base_config,
    num_experts=4,
    capacity_factor=2.0,
    num_experts_per_token=2,
    ff_cls=NaiveMoE
)

train_loader = DataLoader(tokenized_dataset['train'], batch_size=16, shuffle=True)
test_loader = DataLoader(tokenized_dataset['test'], batch_size=16, shuffle=False)

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
model = TransformerClassifier(naive_moe_config).to(DEVICE)
# model = TransformerClassifier(standard_config).to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=5e-5)

In [39]:
from tqdm import tqdm

NUM_OF_EPOCHS = 20

for epoch in range(NUM_OF_EPOCHS):
    model.train()
    train_progress_bar = tqdm(train_loader, desc=f'Train, Epoch {epoch + 1} / {NUM_OF_EPOCHS}')
    running_loss = 0.
    for i, batch in enumerate(train_progress_bar):
        x, y = batch['input_ids'], batch['label']
        x = torch.stack(x, dim=1).to(DEVICE)
        y = y.to(DEVICE)
        optimizer.zero_grad()
        loss = model(x, y)['loss']
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        if i % 10 == 9:
            last_loss = running_loss / 10 # avg loss per batch
            print('batch {} loss: {}'.format(i + 1, last_loss))
            running_loss = 0.

    model.eval()
    with torch.no_grad():
        total_loss = 0
        total_samples = 0
        correct_samples = 0
        test_progress_bar = tqdm(test_loader, desc=f'Test, Epoch {epoch + 1} / {NUM_OF_EPOCHS}')
        for batch in test_progress_bar:
            x, y = batch['input_ids'], batch['label']
            x = torch.stack(x, dim=1).to(DEVICE)
            y = y.to(DEVICE)
            logits = model(x)['logits']
            total_loss += F.cross_entropy(logits, y, reduction='sum').item()
            total_samples += y.shape[0]
            correct_samples += (logits.argmax(dim=-1) == y).sum().item()

        print(f'Epoch {epoch + 1}, loss: {total_loss / total_samples}, accuracy: {correct_samples / total_samples}')

Train, Epoch 1 / 20:   1%|          | 13/1563 [00:00<01:11, 21.60it/s]

batch 10 loss: 0.37670093178749087


Train, Epoch 1 / 20:   1%|▏         | 22/1563 [00:01<01:09, 22.30it/s]

batch 20 loss: 0.2251775212585926


Train, Epoch 1 / 20:   2%|▏         | 34/1563 [00:01<01:08, 22.42it/s]

batch 30 loss: 0.28054225295782087


Train, Epoch 1 / 20:   3%|▎         | 43/1563 [00:01<01:08, 22.28it/s]

batch 40 loss: 0.27680127546191213


Train, Epoch 1 / 20:   3%|▎         | 52/1563 [00:02<01:08, 22.07it/s]

batch 50 loss: 0.24991972297430037


Train, Epoch 1 / 20:   4%|▍         | 64/1563 [00:02<01:08, 21.86it/s]

batch 60 loss: 0.29959657192230227


Train, Epoch 1 / 20:   5%|▍         | 73/1563 [00:03<01:08, 21.80it/s]

batch 70 loss: 0.3274465538561344


Train, Epoch 1 / 20:   5%|▌         | 82/1563 [00:03<01:14, 19.91it/s]

batch 80 loss: 0.23757996410131454


Train, Epoch 1 / 20:   6%|▌         | 93/1563 [00:04<01:15, 19.36it/s]

batch 90 loss: 0.25918947011232374


Train, Epoch 1 / 20:   7%|▋         | 102/1563 [00:04<01:18, 18.60it/s]

batch 100 loss: 0.29604374468326566


Train, Epoch 1 / 20:   7%|▋         | 112/1563 [00:05<01:21, 17.84it/s]

batch 110 loss: 0.19762752577662468


Train, Epoch 1 / 20:   8%|▊         | 123/1563 [00:05<01:07, 21.46it/s]

batch 120 loss: 0.273972886800766


Train, Epoch 1 / 20:   8%|▊         | 132/1563 [00:06<01:04, 22.15it/s]

batch 130 loss: 0.3083727121353149


Train, Epoch 1 / 20:   9%|▉         | 144/1563 [00:06<01:02, 22.71it/s]

batch 140 loss: 0.2684804856777191


Train, Epoch 1 / 20:  10%|▉         | 153/1563 [00:07<01:02, 22.39it/s]

batch 150 loss: 0.2643652230501175


Train, Epoch 1 / 20:  10%|█         | 162/1563 [00:07<01:01, 22.76it/s]

batch 160 loss: 0.2715693160891533


Train, Epoch 1 / 20:  11%|█         | 174/1563 [00:08<01:00, 22.78it/s]

batch 170 loss: 0.28998180031776427


Train, Epoch 1 / 20:  12%|█▏        | 183/1563 [00:08<01:00, 22.79it/s]

batch 180 loss: 0.32062784880399703


Train, Epoch 1 / 20:  12%|█▏        | 192/1563 [00:08<01:00, 22.80it/s]

batch 190 loss: 0.2779973141849041


Train, Epoch 1 / 20:  13%|█▎        | 204/1563 [00:09<01:00, 22.58it/s]

batch 200 loss: 0.2980372652411461


Train, Epoch 1 / 20:  14%|█▎        | 213/1563 [00:09<00:59, 22.86it/s]

batch 210 loss: 0.272040419280529


Train, Epoch 1 / 20:  14%|█▍        | 222/1563 [00:10<01:00, 22.13it/s]

batch 220 loss: 0.31135295182466505


Train, Epoch 1 / 20:  15%|█▍        | 234/1563 [00:10<00:59, 22.31it/s]

batch 230 loss: 0.3567864328622818


Train, Epoch 1 / 20:  16%|█▌        | 243/1563 [00:11<00:58, 22.50it/s]

batch 240 loss: 0.27354110553860667


Train, Epoch 1 / 20:  16%|█▌        | 252/1563 [00:11<00:57, 22.70it/s]

batch 250 loss: 0.3081044271588326


Train, Epoch 1 / 20:  17%|█▋        | 264/1563 [00:12<00:58, 22.37it/s]

batch 260 loss: 0.39756243824958803


Train, Epoch 1 / 20:  17%|█▋        | 273/1563 [00:12<00:57, 22.51it/s]

batch 270 loss: 0.32430557161569595


Train, Epoch 1 / 20:  18%|█▊        | 282/1563 [00:12<00:56, 22.61it/s]

batch 280 loss: 0.2239391840994358


Train, Epoch 1 / 20:  19%|█▉        | 294/1563 [00:13<00:56, 22.35it/s]

batch 290 loss: 0.3188295543193817


Train, Epoch 1 / 20:  19%|█▉        | 303/1563 [00:13<00:57, 22.06it/s]

batch 300 loss: 0.2285294145345688


Train, Epoch 1 / 20:  20%|█▉        | 312/1563 [00:14<00:56, 22.23it/s]

batch 310 loss: 0.233743679150939


Train, Epoch 1 / 20:  21%|██        | 324/1563 [00:14<00:54, 22.59it/s]

batch 320 loss: 0.3364263460040092


Train, Epoch 1 / 20:  21%|██▏       | 333/1563 [00:15<00:55, 22.22it/s]

batch 330 loss: 0.3921950817108154


Train, Epoch 1 / 20:  22%|██▏       | 342/1563 [00:15<00:59, 20.53it/s]

batch 340 loss: 0.358116452395916


Train, Epoch 1 / 20:  23%|██▎       | 353/1563 [00:16<01:00, 19.91it/s]

batch 350 loss: 0.3057222366333008


Train, Epoch 1 / 20:  23%|██▎       | 361/1563 [00:16<00:58, 20.48it/s]

batch 360 loss: 0.25651163309812547


Train, Epoch 1 / 20:  24%|██▍       | 372/1563 [00:17<01:04, 18.49it/s]

batch 370 loss: 0.30241258814930916


Train, Epoch 1 / 20:  24%|██▍       | 382/1563 [00:17<00:58, 20.31it/s]

batch 380 loss: 0.38310854136943817


Train, Epoch 1 / 20:  25%|██▌       | 394/1563 [00:18<00:53, 21.98it/s]

batch 390 loss: 0.40808776915073397


Train, Epoch 1 / 20:  26%|██▌       | 403/1563 [00:18<00:52, 22.04it/s]

batch 400 loss: 0.24425121545791625


Train, Epoch 1 / 20:  26%|██▋       | 412/1563 [00:19<00:51, 22.44it/s]

batch 410 loss: 0.24457489103078842


Train, Epoch 1 / 20:  27%|██▋       | 424/1563 [00:19<00:51, 22.28it/s]

batch 420 loss: 0.2688329368829727


Train, Epoch 1 / 20:  28%|██▊       | 433/1563 [00:20<00:49, 22.70it/s]

batch 430 loss: 0.26444187462329866


Train, Epoch 1 / 20:  28%|██▊       | 442/1563 [00:20<00:51, 21.96it/s]

batch 440 loss: 0.37969709038734434


Train, Epoch 1 / 20:  29%|██▉       | 454/1563 [00:21<00:50, 22.16it/s]

batch 450 loss: 0.2617186024785042


Train, Epoch 1 / 20:  30%|██▉       | 463/1563 [00:21<00:49, 22.01it/s]

batch 460 loss: 0.22597186267375946


Train, Epoch 1 / 20:  30%|███       | 472/1563 [00:21<00:49, 22.18it/s]

batch 470 loss: 0.23442681953310968


Train, Epoch 1 / 20:  31%|███       | 484/1563 [00:22<00:47, 22.58it/s]

batch 480 loss: 0.18802253603935243


Train, Epoch 1 / 20:  32%|███▏      | 493/1563 [00:22<00:47, 22.74it/s]

batch 490 loss: 0.28402462005615237


Train, Epoch 1 / 20:  32%|███▏      | 502/1563 [00:23<00:47, 22.31it/s]

batch 500 loss: 0.2398355718702078


Train, Epoch 1 / 20:  33%|███▎      | 514/1563 [00:23<00:47, 22.29it/s]

batch 510 loss: 0.2822267085313797


Train, Epoch 1 / 20:  33%|███▎      | 523/1563 [00:24<00:46, 22.19it/s]

batch 520 loss: 0.22532384991645812


Train, Epoch 1 / 20:  34%|███▍      | 532/1563 [00:24<00:47, 21.90it/s]

batch 530 loss: 0.24422767087817193


Train, Epoch 1 / 20:  35%|███▍      | 544/1563 [00:25<00:45, 22.57it/s]

batch 540 loss: 0.27697748094797137


Train, Epoch 1 / 20:  35%|███▌      | 553/1563 [00:25<00:45, 22.17it/s]

batch 550 loss: 0.2160038486123085


Train, Epoch 1 / 20:  36%|███▌      | 562/1563 [00:25<00:44, 22.38it/s]

batch 560 loss: 0.2910437107086182


Train, Epoch 1 / 20:  37%|███▋      | 574/1563 [00:26<00:43, 22.76it/s]

batch 570 loss: 0.3880449905991554


Train, Epoch 1 / 20:  37%|███▋      | 583/1563 [00:26<00:43, 22.60it/s]

batch 580 loss: 0.3041867569088936


Train, Epoch 1 / 20:  38%|███▊      | 592/1563 [00:27<00:43, 22.41it/s]

batch 590 loss: 0.22231761887669563


Train, Epoch 1 / 20:  38%|███▊      | 601/1563 [00:27<00:44, 21.63it/s]

batch 600 loss: 0.20375871434807777


Train, Epoch 1 / 20:  39%|███▉      | 613/1563 [00:28<00:49, 19.33it/s]

batch 610 loss: 0.2390650436282158


Train, Epoch 1 / 20:  40%|███▉      | 623/1563 [00:28<00:52, 17.74it/s]

batch 620 loss: 0.2992089167237282


Train, Epoch 1 / 20:  40%|████      | 633/1563 [00:29<00:51, 17.95it/s]

batch 630 loss: 0.24592050537467003


Train, Epoch 1 / 20:  41%|████      | 642/1563 [00:29<00:46, 19.68it/s]

batch 640 loss: 0.24432943388819695


Train, Epoch 1 / 20:  42%|████▏     | 654/1563 [00:30<00:40, 22.34it/s]

batch 650 loss: 0.26547898054122926


Train, Epoch 1 / 20:  42%|████▏     | 663/1563 [00:30<00:41, 21.70it/s]

batch 660 loss: 0.2953246496617794


Train, Epoch 1 / 20:  43%|████▎     | 672/1563 [00:31<00:40, 22.08it/s]

batch 670 loss: 0.26245166212320326


Train, Epoch 1 / 20:  44%|████▍     | 684/1563 [00:31<00:41, 21.25it/s]

batch 680 loss: 0.39676034450531006


Train, Epoch 1 / 20:  44%|████▍     | 693/1563 [00:32<00:39, 21.91it/s]

batch 690 loss: 0.3164411664009094


Train, Epoch 1 / 20:  45%|████▍     | 702/1563 [00:32<00:38, 22.28it/s]

batch 700 loss: 0.3587719053030014


Train, Epoch 1 / 20:  46%|████▌     | 714/1563 [00:33<00:38, 22.13it/s]

batch 710 loss: 0.30707456469535827


Train, Epoch 1 / 20:  46%|████▋     | 723/1563 [00:33<00:36, 22.74it/s]

batch 720 loss: 0.270429103821516


Train, Epoch 1 / 20:  47%|████▋     | 732/1563 [00:33<00:37, 22.42it/s]

batch 730 loss: 0.26548769772052766


Train, Epoch 1 / 20:  48%|████▊     | 744/1563 [00:34<00:37, 21.97it/s]

batch 740 loss: 0.3058388829231262


Train, Epoch 1 / 20:  48%|████▊     | 753/1563 [00:34<00:36, 22.43it/s]

batch 750 loss: 0.3455552488565445


Train, Epoch 1 / 20:  49%|████▉     | 762/1563 [00:35<00:35, 22.54it/s]

batch 760 loss: 0.29195225834846494


Train, Epoch 1 / 20:  50%|████▉     | 774/1563 [00:35<00:34, 22.60it/s]

batch 770 loss: 0.24401937648653985


Train, Epoch 1 / 20:  50%|█████     | 783/1563 [00:36<00:34, 22.79it/s]

batch 780 loss: 0.2972130745649338


Train, Epoch 1 / 20:  51%|█████     | 792/1563 [00:36<00:33, 23.20it/s]

batch 790 loss: 0.32252675890922544


Train, Epoch 1 / 20:  51%|█████▏    | 804/1563 [00:37<00:32, 23.19it/s]

batch 800 loss: 0.2602537080645561


Train, Epoch 1 / 20:  52%|█████▏    | 813/1563 [00:37<00:33, 22.71it/s]

batch 810 loss: 0.24255440905690193


Train, Epoch 1 / 20:  53%|█████▎    | 822/1563 [00:37<00:33, 22.38it/s]

batch 820 loss: 0.2927178904414177


Train, Epoch 1 / 20:  53%|█████▎    | 834/1563 [00:38<00:32, 22.37it/s]

batch 830 loss: 0.29346292093396187


Train, Epoch 1 / 20:  54%|█████▍    | 843/1563 [00:38<00:31, 22.64it/s]

batch 840 loss: 0.2815805867314339


Train, Epoch 1 / 20:  55%|█████▍    | 852/1563 [00:39<00:31, 22.52it/s]

batch 850 loss: 0.3246880814433098


Train, Epoch 1 / 20:  55%|█████▌    | 861/1563 [00:39<00:32, 21.86it/s]

batch 860 loss: 0.32655875086784364


Train, Epoch 1 / 20:  56%|█████▌    | 873/1563 [00:40<00:35, 19.29it/s]

batch 870 loss: 0.3011837527155876


Train, Epoch 1 / 20:  56%|█████▋    | 883/1563 [00:40<00:37, 18.17it/s]

batch 880 loss: 0.28140928521752356


Train, Epoch 1 / 20:  57%|█████▋    | 893/1563 [00:41<00:38, 17.41it/s]

batch 890 loss: 0.2576575823128223


Train, Epoch 1 / 20:  58%|█████▊    | 904/1563 [00:41<00:31, 21.10it/s]

batch 900 loss: 0.3052577212452888


Train, Epoch 1 / 20:  58%|█████▊    | 913/1563 [00:42<00:29, 21.98it/s]

batch 910 loss: 0.30192373394966127


Train, Epoch 1 / 20:  59%|█████▉    | 922/1563 [00:42<00:29, 21.89it/s]

batch 920 loss: 0.3364280924201012


Train, Epoch 1 / 20:  60%|█████▉    | 934/1563 [00:43<00:28, 22.36it/s]

batch 930 loss: 0.34617120325565337


Train, Epoch 1 / 20:  60%|██████    | 943/1563 [00:43<00:28, 22.04it/s]

batch 940 loss: 0.31570892184972765


Train, Epoch 1 / 20:  61%|██████    | 952/1563 [00:44<00:27, 21.87it/s]

batch 950 loss: 0.23529833927750587


Train, Epoch 1 / 20:  62%|██████▏   | 964/1563 [00:44<00:27, 21.97it/s]

batch 960 loss: 0.260196440666914


Train, Epoch 1 / 20:  62%|██████▏   | 973/1563 [00:45<00:26, 22.09it/s]

batch 970 loss: 0.3099876239895821


Train, Epoch 1 / 20:  63%|██████▎   | 982/1563 [00:45<00:25, 22.44it/s]

batch 980 loss: 0.324512754380703


Train, Epoch 1 / 20:  64%|██████▎   | 994/1563 [00:45<00:24, 22.77it/s]

batch 990 loss: 0.32952682450413706


Train, Epoch 1 / 20:  64%|██████▍   | 1003/1563 [00:46<00:24, 22.92it/s]

batch 1000 loss: 0.2108329191803932


Train, Epoch 1 / 20:  65%|██████▍   | 1012/1563 [00:46<00:24, 22.63it/s]

batch 1010 loss: 0.2596748426556587


Train, Epoch 1 / 20:  66%|██████▌   | 1024/1563 [00:47<00:24, 22.37it/s]

batch 1020 loss: 0.31626269966363907


Train, Epoch 1 / 20:  66%|██████▌   | 1033/1563 [00:47<00:23, 22.90it/s]

batch 1030 loss: 0.321961634606123


Train, Epoch 1 / 20:  67%|██████▋   | 1042/1563 [00:48<00:22, 23.04it/s]

batch 1040 loss: 0.2639449715614319


Train, Epoch 1 / 20:  67%|██████▋   | 1054/1563 [00:48<00:22, 22.44it/s]

batch 1050 loss: 0.25613000988960266


Train, Epoch 1 / 20:  68%|██████▊   | 1063/1563 [00:48<00:21, 22.80it/s]

batch 1060 loss: 0.27062033414840697


Train, Epoch 1 / 20:  69%|██████▊   | 1072/1563 [00:49<00:21, 22.38it/s]

batch 1070 loss: 0.23243875354528426


Train, Epoch 1 / 20:  69%|██████▉   | 1084/1563 [00:49<00:21, 22.65it/s]

batch 1080 loss: 0.21979021802544593


Train, Epoch 1 / 20:  70%|██████▉   | 1093/1563 [00:50<00:21, 22.26it/s]

batch 1090 loss: 0.29293536245822904


Train, Epoch 1 / 20:  71%|███████   | 1102/1563 [00:50<00:20, 21.99it/s]

batch 1100 loss: 0.29212818294763565


Train, Epoch 1 / 20:  71%|███████▏  | 1114/1563 [00:51<00:20, 22.16it/s]

batch 1110 loss: 0.3066870704293251


Train, Epoch 1 / 20:  72%|███████▏  | 1123/1563 [00:51<00:21, 20.05it/s]

batch 1120 loss: 0.26885546296834945


Train, Epoch 1 / 20:  72%|███████▏  | 1132/1563 [00:52<00:22, 19.07it/s]

batch 1130 loss: 0.2540665678679943


Train, Epoch 1 / 20:  73%|███████▎  | 1142/1563 [00:52<00:23, 17.93it/s]

batch 1140 loss: 0.28312249630689623


Train, Epoch 1 / 20:  74%|███████▎  | 1152/1563 [00:53<00:23, 17.67it/s]

batch 1150 loss: 0.2269837126135826


Train, Epoch 1 / 20:  74%|███████▍  | 1162/1563 [00:53<00:19, 20.78it/s]

batch 1160 loss: 0.3895955055952072


Train, Epoch 1 / 20:  75%|███████▌  | 1174/1563 [00:54<00:17, 22.21it/s]

batch 1170 loss: 0.2991556242108345


Train, Epoch 1 / 20:  76%|███████▌  | 1183/1563 [00:54<00:17, 22.21it/s]

batch 1180 loss: 0.26074802726507185


Train, Epoch 1 / 20:  76%|███████▋  | 1192/1563 [00:55<00:16, 22.60it/s]

batch 1190 loss: 0.2809549242258072


Train, Epoch 1 / 20:  77%|███████▋  | 1204/1563 [00:55<00:15, 22.82it/s]

batch 1200 loss: 0.23300845697522163


Train, Epoch 1 / 20:  78%|███████▊  | 1213/1563 [00:56<00:15, 23.11it/s]

batch 1210 loss: 0.26539884954690934


Train, Epoch 1 / 20:  78%|███████▊  | 1222/1563 [00:56<00:15, 22.52it/s]

batch 1220 loss: 0.28043911755084994


Train, Epoch 1 / 20:  79%|███████▉  | 1234/1563 [00:56<00:14, 22.53it/s]

batch 1230 loss: 0.31220094859600067


Train, Epoch 1 / 20:  80%|███████▉  | 1243/1563 [00:57<00:14, 22.09it/s]

batch 1240 loss: 0.2431547585874796


Train, Epoch 1 / 20:  80%|████████  | 1252/1563 [00:57<00:14, 22.11it/s]

batch 1250 loss: 0.3051463901996613


Train, Epoch 1 / 20:  81%|████████  | 1264/1563 [00:58<00:13, 22.33it/s]

batch 1260 loss: 0.27218920513987543


Train, Epoch 1 / 20:  81%|████████▏ | 1273/1563 [00:58<00:13, 22.03it/s]

batch 1270 loss: 0.3177155241370201


Train, Epoch 1 / 20:  82%|████████▏ | 1282/1563 [00:59<00:12, 22.42it/s]

batch 1280 loss: 0.35636793822050095


Train, Epoch 1 / 20:  83%|████████▎ | 1294/1563 [00:59<00:11, 22.70it/s]

batch 1290 loss: 0.23894425556063653


Train, Epoch 1 / 20:  83%|████████▎ | 1303/1563 [01:00<00:11, 22.63it/s]

batch 1300 loss: 0.322389779239893


Train, Epoch 1 / 20:  84%|████████▍ | 1312/1563 [01:00<00:11, 22.52it/s]

batch 1310 loss: 0.31978911757469175


Train, Epoch 1 / 20:  85%|████████▍ | 1324/1563 [01:01<00:10, 22.84it/s]

batch 1320 loss: 0.29604066610336305


Train, Epoch 1 / 20:  85%|████████▌ | 1333/1563 [01:01<00:10, 22.83it/s]

batch 1330 loss: 0.2966657891869545


Train, Epoch 1 / 20:  86%|████████▌ | 1342/1563 [01:01<00:09, 22.38it/s]

batch 1340 loss: 0.2824313689023256


Train, Epoch 1 / 20:  87%|████████▋ | 1354/1563 [01:02<00:09, 22.28it/s]

batch 1350 loss: 0.31585348546504977


Train, Epoch 1 / 20:  87%|████████▋ | 1363/1563 [01:02<00:09, 21.74it/s]

batch 1360 loss: 0.3208422526717186


Train, Epoch 1 / 20:  88%|████████▊ | 1372/1563 [01:03<00:08, 21.61it/s]

batch 1370 loss: 0.271202988922596


Train, Epoch 1 / 20:  88%|████████▊ | 1381/1563 [01:03<00:08, 20.33it/s]

batch 1380 loss: 0.27199890911579133


Train, Epoch 1 / 20:  89%|████████▉ | 1393/1563 [01:04<00:08, 20.01it/s]

batch 1390 loss: 0.2915151111781597


Train, Epoch 1 / 20:  90%|████████▉ | 1403/1563 [01:04<00:08, 19.21it/s]

batch 1400 loss: 0.2396743156015873


Train, Epoch 1 / 20:  90%|█████████ | 1413/1563 [01:05<00:08, 17.79it/s]

batch 1410 loss: 0.22237388864159585


Train, Epoch 1 / 20:  91%|█████████ | 1424/1563 [01:05<00:06, 21.35it/s]

batch 1420 loss: 0.2782431557774544


Train, Epoch 1 / 20:  92%|█████████▏| 1433/1563 [01:06<00:05, 22.42it/s]

batch 1430 loss: 0.26793055385351183


Train, Epoch 1 / 20:  92%|█████████▏| 1442/1563 [01:06<00:05, 22.71it/s]

batch 1440 loss: 0.353572216629982


Train, Epoch 1 / 20:  93%|█████████▎| 1454/1563 [01:07<00:04, 23.06it/s]

batch 1450 loss: 0.26053204089403154


Train, Epoch 1 / 20:  94%|█████████▎| 1463/1563 [01:07<00:04, 22.58it/s]

batch 1460 loss: 0.2529899820685387


Train, Epoch 1 / 20:  94%|█████████▍| 1472/1563 [01:07<00:03, 22.77it/s]

batch 1470 loss: 0.23269649520516394


Train, Epoch 1 / 20:  95%|█████████▍| 1484/1563 [01:08<00:03, 22.83it/s]

batch 1480 loss: 0.2305024355649948


Train, Epoch 1 / 20:  96%|█████████▌| 1493/1563 [01:08<00:03, 22.64it/s]

batch 1490 loss: 0.3499841868877411


Train, Epoch 1 / 20:  96%|█████████▌| 1502/1563 [01:09<00:02, 22.89it/s]

batch 1500 loss: 0.2931295961141586


Train, Epoch 1 / 20:  97%|█████████▋| 1514/1563 [01:09<00:02, 23.05it/s]

batch 1510 loss: 0.2858744546771049


Train, Epoch 1 / 20:  97%|█████████▋| 1523/1563 [01:10<00:01, 22.80it/s]

batch 1520 loss: 0.22431859895586967


Train, Epoch 1 / 20:  98%|█████████▊| 1532/1563 [01:10<00:01, 22.76it/s]

batch 1530 loss: 0.24999582171440124


Train, Epoch 1 / 20:  99%|█████████▉| 1544/1563 [01:11<00:00, 22.54it/s]

batch 1540 loss: 0.2735913619399071


Train, Epoch 1 / 20:  99%|█████████▉| 1553/1563 [01:11<00:00, 22.17it/s]

batch 1550 loss: 0.32342141792178153


Train, Epoch 1 / 20: 100%|██████████| 1563/1563 [01:11<00:00, 21.71it/s]


batch 1560 loss: 0.3315111678093672


Test, Epoch 1 / 20: 100%|██████████| 1563/1563 [00:36<00:00, 42.50it/s]


Epoch 1, loss: 0.48676942086488006, accuracy: 0.80512


Train, Epoch 2 / 20:   1%|          | 12/1563 [00:00<01:08, 22.62it/s]

batch 10 loss: 0.22359880357980727


Train, Epoch 2 / 20:   2%|▏         | 24/1563 [00:01<01:07, 22.65it/s]

batch 20 loss: 0.2677243433892727


Train, Epoch 2 / 20:   2%|▏         | 33/1563 [00:01<01:08, 22.47it/s]

batch 30 loss: 0.29738154262304306


Train, Epoch 2 / 20:   3%|▎         | 42/1563 [00:01<01:07, 22.49it/s]

batch 40 loss: 0.27598222233355046


Train, Epoch 2 / 20:   3%|▎         | 54/1563 [00:02<01:07, 22.22it/s]

batch 50 loss: 0.22231195420026778


Train, Epoch 2 / 20:   4%|▍         | 63/1563 [00:02<01:08, 21.96it/s]

batch 60 loss: 0.23784417882561684


Train, Epoch 2 / 20:   5%|▍         | 73/1563 [00:03<01:17, 19.29it/s]

batch 70 loss: 0.20704808607697486


Train, Epoch 2 / 20:   5%|▌         | 83/1563 [00:03<01:18, 18.81it/s]

batch 80 loss: 0.2652584046125412


Train, Epoch 2 / 20:   6%|▌         | 93/1563 [00:04<01:21, 18.02it/s]

batch 90 loss: 0.2825226474553347


Train, Epoch 2 / 20:   7%|▋         | 102/1563 [00:04<01:17, 18.81it/s]

batch 100 loss: 0.24501973688602446


Train, Epoch 2 / 20:   7%|▋         | 114/1563 [00:05<01:07, 21.36it/s]

batch 110 loss: 0.36398832499980927


Train, Epoch 2 / 20:   8%|▊         | 123/1563 [00:05<01:05, 21.98it/s]

batch 120 loss: 0.24911485761404037


Train, Epoch 2 / 20:   8%|▊         | 132/1563 [00:06<01:05, 21.85it/s]

batch 130 loss: 0.2248636581003666


Train, Epoch 2 / 20:   9%|▉         | 144/1563 [00:06<01:02, 22.53it/s]

batch 140 loss: 0.24585519395768643


Train, Epoch 2 / 20:  10%|▉         | 153/1563 [00:07<01:02, 22.62it/s]

batch 150 loss: 0.26334749460220336


Train, Epoch 2 / 20:  10%|█         | 162/1563 [00:07<01:01, 22.85it/s]

batch 160 loss: 0.23019812516868116


Train, Epoch 2 / 20:  11%|█         | 174/1563 [00:08<01:01, 22.46it/s]

batch 170 loss: 0.29913735538721087


Train, Epoch 2 / 20:  12%|█▏        | 183/1563 [00:08<01:01, 22.38it/s]

batch 180 loss: 0.240610421448946


Train, Epoch 2 / 20:  12%|█▏        | 192/1563 [00:08<01:01, 22.27it/s]

batch 190 loss: 0.22804616764187813


Train, Epoch 2 / 20:  13%|█▎        | 204/1563 [00:09<01:00, 22.59it/s]

batch 200 loss: 0.21862371042370796


Train, Epoch 2 / 20:  14%|█▎        | 213/1563 [00:09<01:00, 22.31it/s]

batch 210 loss: 0.20374032370746137


Train, Epoch 2 / 20:  14%|█▍        | 222/1563 [00:10<01:01, 21.81it/s]

batch 220 loss: 0.30587270483374596


Train, Epoch 2 / 20:  15%|█▍        | 234/1563 [00:10<00:59, 22.39it/s]

batch 230 loss: 0.3089796707034111


Train, Epoch 2 / 20:  16%|█▌        | 243/1563 [00:11<00:58, 22.49it/s]

batch 240 loss: 0.22711289003491403


Train, Epoch 2 / 20:  16%|█▌        | 252/1563 [00:11<00:58, 22.57it/s]

batch 250 loss: 0.27095199301838874


Train, Epoch 2 / 20:  17%|█▋        | 264/1563 [00:12<00:56, 22.83it/s]

batch 260 loss: 0.2173399981111288


Train, Epoch 2 / 20:  17%|█▋        | 273/1563 [00:12<00:56, 22.74it/s]

batch 270 loss: 0.3822172611951828


Train, Epoch 2 / 20:  18%|█▊        | 282/1563 [00:13<00:56, 22.72it/s]

batch 280 loss: 0.2964117258787155


Train, Epoch 2 / 20:  19%|█▉        | 294/1563 [00:13<00:55, 22.67it/s]

batch 290 loss: 0.28607070744037627


Train, Epoch 2 / 20:  19%|█▉        | 303/1563 [00:13<00:55, 22.74it/s]

batch 300 loss: 0.2951835870742798


Train, Epoch 2 / 20:  20%|█▉        | 312/1563 [00:14<00:57, 21.80it/s]

batch 310 loss: 0.26440069302916525


Train, Epoch 2 / 20:  21%|██        | 324/1563 [00:14<00:58, 21.19it/s]

batch 320 loss: 0.31794060245156286


Train, Epoch 2 / 20:  21%|██        | 332/1563 [00:15<01:03, 19.42it/s]

batch 330 loss: 0.2666277624666691


Train, Epoch 2 / 20:  22%|██▏       | 341/1563 [00:15<01:03, 19.37it/s]

batch 340 loss: 0.25814235359430315


Train, Epoch 2 / 20:  23%|██▎       | 352/1563 [00:16<01:07, 17.94it/s]

batch 350 loss: 0.20013598948717118


Train, Epoch 2 / 20:  23%|██▎       | 364/1563 [00:17<01:00, 19.96it/s]

batch 360 loss: 0.29859926253557206


Train, Epoch 2 / 20:  24%|██▍       | 373/1563 [00:17<00:54, 21.87it/s]

batch 370 loss: 0.36233904361724856


Train, Epoch 2 / 20:  24%|██▍       | 382/1563 [00:17<00:51, 22.76it/s]

batch 380 loss: 0.2314003251492977


Train, Epoch 2 / 20:  25%|██▌       | 394/1563 [00:18<00:53, 21.98it/s]

batch 390 loss: 0.2741846010088921


Train, Epoch 2 / 20:  26%|██▌       | 403/1563 [00:18<00:51, 22.33it/s]

batch 400 loss: 0.2652777537703514


Train, Epoch 2 / 20:  26%|██▋       | 412/1563 [00:19<00:51, 22.17it/s]

batch 410 loss: 0.26769511476159097


Train, Epoch 2 / 20:  27%|██▋       | 424/1563 [00:19<00:51, 21.91it/s]

batch 420 loss: 0.22079671248793603


Train, Epoch 2 / 20:  28%|██▊       | 433/1563 [00:20<00:50, 22.30it/s]

batch 430 loss: 0.3163021966814995


Train, Epoch 2 / 20:  28%|██▊       | 442/1563 [00:20<00:51, 21.92it/s]

batch 440 loss: 0.24153739362955093


Train, Epoch 2 / 20:  29%|██▉       | 454/1563 [00:21<00:49, 22.28it/s]

batch 450 loss: 0.29395539611577987


Train, Epoch 2 / 20:  30%|██▉       | 463/1563 [00:21<00:49, 22.35it/s]

batch 460 loss: 0.2698893532156944


Train, Epoch 2 / 20:  30%|███       | 472/1563 [00:21<00:49, 22.09it/s]

batch 470 loss: 0.23503275215625763


Train, Epoch 2 / 20:  31%|███       | 484/1563 [00:22<00:48, 22.22it/s]

batch 480 loss: 0.1515778608620167


Train, Epoch 2 / 20:  32%|███▏      | 493/1563 [00:22<00:48, 21.96it/s]

batch 490 loss: 0.2565115548670292


Train, Epoch 2 / 20:  32%|███▏      | 502/1563 [00:23<00:47, 22.33it/s]

batch 500 loss: 0.22456907518208027


Train, Epoch 2 / 20:  33%|███▎      | 514/1563 [00:23<00:46, 22.44it/s]

batch 510 loss: 0.31641671657562254


Train, Epoch 2 / 20:  33%|███▎      | 523/1563 [00:24<00:45, 22.71it/s]

batch 520 loss: 0.3063521280884743


Train, Epoch 2 / 20:  34%|███▍      | 532/1563 [00:24<00:46, 22.33it/s]

batch 530 loss: 0.24469681084156036


Train, Epoch 2 / 20:  35%|███▍      | 544/1563 [00:25<00:45, 22.51it/s]

batch 540 loss: 0.25442634895443916


Train, Epoch 2 / 20:  35%|███▌      | 553/1563 [00:25<00:44, 22.63it/s]

batch 550 loss: 0.3527264267206192


Train, Epoch 2 / 20:  36%|███▌      | 562/1563 [00:25<00:44, 22.56it/s]

batch 560 loss: 0.23815193101763726


Train, Epoch 2 / 20:  37%|███▋      | 574/1563 [00:26<00:43, 22.49it/s]

batch 570 loss: 0.30890104100108146


Train, Epoch 2 / 20:  37%|███▋      | 583/1563 [00:26<00:45, 21.74it/s]

batch 580 loss: 0.32201955989003184


Train, Epoch 2 / 20:  38%|███▊      | 592/1563 [00:27<00:48, 19.99it/s]

batch 590 loss: 0.21314020901918412


Train, Epoch 2 / 20:  39%|███▊      | 602/1563 [00:27<00:50, 19.15it/s]

batch 600 loss: 0.31390697956085206


Train, Epoch 2 / 20:  39%|███▉      | 612/1563 [00:28<00:53, 17.93it/s]

batch 610 loss: 0.2978944659233093


Train, Epoch 2 / 20:  40%|███▉      | 624/1563 [00:29<00:46, 20.22it/s]

batch 620 loss: 0.2542321480810642


Train, Epoch 2 / 20:  40%|████      | 633/1563 [00:29<00:42, 21.89it/s]

batch 630 loss: 0.3189414456486702


Train, Epoch 2 / 20:  41%|████      | 642/1563 [00:29<00:41, 21.97it/s]

batch 640 loss: 0.26514222621917727


Train, Epoch 2 / 20:  42%|████▏     | 654/1563 [00:30<00:40, 22.51it/s]

batch 650 loss: 0.27384813614189624


Train, Epoch 2 / 20:  42%|████▏     | 663/1563 [00:30<00:40, 22.48it/s]

batch 660 loss: 0.32596625834703447


Train, Epoch 2 / 20:  43%|████▎     | 672/1563 [00:31<00:40, 22.14it/s]

batch 670 loss: 0.3434128314256668


Train, Epoch 2 / 20:  44%|████▍     | 684/1563 [00:31<00:38, 22.61it/s]

batch 680 loss: 0.2852489143610001


Train, Epoch 2 / 20:  44%|████▍     | 693/1563 [00:32<00:38, 22.40it/s]

batch 690 loss: 0.35579894185066224


Train, Epoch 2 / 20:  45%|████▍     | 702/1563 [00:32<00:38, 22.62it/s]

batch 700 loss: 0.32050370126962663


Train, Epoch 2 / 20:  46%|████▌     | 714/1563 [00:33<00:38, 22.31it/s]

batch 710 loss: 0.3373748481273651


Train, Epoch 2 / 20:  46%|████▋     | 723/1563 [00:33<00:37, 22.42it/s]

batch 720 loss: 0.24860634580254554


Train, Epoch 2 / 20:  47%|████▋     | 732/1563 [00:33<00:36, 22.65it/s]

batch 730 loss: 0.28692388609051706


Train, Epoch 2 / 20:  48%|████▊     | 744/1563 [00:34<00:35, 22.82it/s]

batch 740 loss: 0.3708557166159153


Train, Epoch 2 / 20:  48%|████▊     | 753/1563 [00:34<00:36, 21.94it/s]

batch 750 loss: 0.27963917404413224


Train, Epoch 2 / 20:  49%|████▉     | 762/1563 [00:35<00:36, 21.78it/s]

batch 760 loss: 0.25731126219034195


Train, Epoch 2 / 20:  50%|████▉     | 774/1563 [00:35<00:35, 22.25it/s]

batch 770 loss: 0.28061567693948747


Train, Epoch 2 / 20:  50%|█████     | 783/1563 [00:36<00:35, 22.28it/s]

batch 780 loss: 0.2787208750844002


Train, Epoch 2 / 20:  51%|█████     | 792/1563 [00:36<00:34, 22.42it/s]

batch 790 loss: 0.29051663428545


Train, Epoch 2 / 20:  51%|█████▏    | 804/1563 [00:37<00:33, 22.88it/s]

batch 800 loss: 0.31972464770078657


Train, Epoch 2 / 20:  52%|█████▏    | 813/1563 [00:37<00:33, 22.68it/s]

batch 810 loss: 0.2641400404274464


Train, Epoch 2 / 20:  53%|█████▎    | 822/1563 [00:37<00:33, 22.44it/s]

batch 820 loss: 0.26135381162166593


Train, Epoch 2 / 20:  53%|█████▎    | 834/1563 [00:38<00:32, 22.51it/s]

batch 830 loss: 0.2569774530827999


Train, Epoch 2 / 20:  54%|█████▍    | 843/1563 [00:38<00:33, 21.24it/s]

batch 840 loss: 0.20004781559109688


Train, Epoch 2 / 20:  55%|█████▍    | 853/1563 [00:39<00:37, 18.86it/s]

batch 850 loss: 0.28647490292787553


Train, Epoch 2 / 20:  55%|█████▌    | 863/1563 [00:40<00:39, 17.77it/s]

batch 860 loss: 0.32801041603088377


Train, Epoch 2 / 20:  56%|█████▌    | 873/1563 [00:40<00:43, 16.01it/s]

batch 870 loss: 0.2945342496037483


Train, Epoch 2 / 20:  56%|█████▋    | 883/1563 [00:41<00:34, 19.95it/s]

batch 880 loss: 0.2628392592072487


Train, Epoch 2 / 20:  57%|█████▋    | 892/1563 [00:41<00:31, 21.55it/s]

batch 890 loss: 0.2841470681130886


Train, Epoch 2 / 20:  58%|█████▊    | 904/1563 [00:42<00:29, 22.69it/s]

batch 900 loss: 0.2809849128127098


Train, Epoch 2 / 20:  58%|█████▊    | 913/1563 [00:42<00:29, 22.27it/s]

batch 910 loss: 0.23279405683279036


Train, Epoch 2 / 20:  59%|█████▉    | 922/1563 [00:42<00:28, 22.34it/s]

batch 920 loss: 0.2281072475016117


Train, Epoch 2 / 20:  60%|█████▉    | 934/1563 [00:43<00:28, 22.35it/s]

batch 930 loss: 0.29906653687357904


Train, Epoch 2 / 20:  60%|██████    | 943/1563 [00:43<00:27, 22.29it/s]

batch 940 loss: 0.27474588602781297


Train, Epoch 2 / 20:  61%|██████    | 952/1563 [00:44<00:28, 21.72it/s]

batch 950 loss: 0.20826276317238807


Train, Epoch 2 / 20:  62%|██████▏   | 964/1563 [00:44<00:27, 21.96it/s]

batch 960 loss: 0.2361849159002304


Train, Epoch 2 / 20:  62%|██████▏   | 973/1563 [00:45<00:26, 22.04it/s]

batch 970 loss: 0.2943937689065933


Train, Epoch 2 / 20:  63%|██████▎   | 982/1563 [00:45<00:26, 22.11it/s]

batch 980 loss: 0.34023143351078033


Train, Epoch 2 / 20:  64%|██████▎   | 994/1563 [00:46<00:24, 23.05it/s]

batch 990 loss: 0.26886272206902506


Train, Epoch 2 / 20:  64%|██████▍   | 1003/1563 [00:46<00:25, 22.27it/s]

batch 1000 loss: 0.27751443833112716


Train, Epoch 2 / 20:  65%|██████▍   | 1012/1563 [00:46<00:24, 22.46it/s]

batch 1010 loss: 0.29921081811189654


Train, Epoch 2 / 20:  66%|██████▌   | 1024/1563 [00:47<00:23, 22.48it/s]

batch 1020 loss: 0.35324666649103165


Train, Epoch 2 / 20:  66%|██████▌   | 1033/1563 [00:47<00:23, 22.82it/s]

batch 1030 loss: 0.2320831336081028


Train, Epoch 2 / 20:  67%|██████▋   | 1042/1563 [00:48<00:23, 22.19it/s]

batch 1040 loss: 0.27137555107474326


Train, Epoch 2 / 20:  67%|██████▋   | 1054/1563 [00:48<00:22, 22.41it/s]

batch 1050 loss: 0.20411449670791626


Train, Epoch 2 / 20:  68%|██████▊   | 1063/1563 [00:49<00:21, 22.85it/s]

batch 1060 loss: 0.23223066404461862


Train, Epoch 2 / 20:  69%|██████▊   | 1072/1563 [00:49<00:22, 21.99it/s]

batch 1070 loss: 0.2813676193356514


Train, Epoch 2 / 20:  69%|██████▉   | 1084/1563 [00:50<00:21, 22.44it/s]

batch 1080 loss: 0.25684620440006256


Train, Epoch 2 / 20:  70%|██████▉   | 1093/1563 [00:50<00:20, 22.46it/s]

batch 1090 loss: 0.23975468873977662


Train, Epoch 2 / 20:  71%|███████   | 1102/1563 [00:50<00:22, 20.22it/s]

batch 1100 loss: 0.3803065732121468


Train, Epoch 2 / 20:  71%|███████   | 1113/1563 [00:51<00:24, 18.55it/s]

batch 1110 loss: 0.28520754128694537


Train, Epoch 2 / 20:  72%|███████▏  | 1122/1563 [00:52<00:23, 19.01it/s]

batch 1120 loss: 0.3246051326394081


Train, Epoch 2 / 20:  72%|███████▏  | 1132/1563 [00:52<00:24, 17.57it/s]

batch 1130 loss: 0.24714254923164844


Train, Epoch 2 / 20:  73%|███████▎  | 1142/1563 [00:53<00:20, 20.34it/s]

batch 1140 loss: 0.33695184886455537


Train, Epoch 2 / 20:  74%|███████▍  | 1154/1563 [00:53<00:18, 21.95it/s]

batch 1150 loss: 0.2626219891011715


Train, Epoch 2 / 20:  74%|███████▍  | 1163/1563 [00:54<00:17, 22.64it/s]

batch 1160 loss: 0.2583519406616688


Train, Epoch 2 / 20:  75%|███████▍  | 1172/1563 [00:54<00:17, 22.37it/s]

batch 1170 loss: 0.28336876779794695


Train, Epoch 2 / 20:  76%|███████▌  | 1184/1563 [00:54<00:16, 22.76it/s]

batch 1180 loss: 0.24714018180966377


Train, Epoch 2 / 20:  76%|███████▋  | 1193/1563 [00:55<00:16, 22.59it/s]

batch 1190 loss: 0.23464671969413758


Train, Epoch 2 / 20:  77%|███████▋  | 1202/1563 [00:55<00:16, 22.13it/s]

batch 1200 loss: 0.28225120082497596


Train, Epoch 2 / 20:  78%|███████▊  | 1214/1563 [00:56<00:15, 22.50it/s]

batch 1210 loss: 0.22748072743415831


Train, Epoch 2 / 20:  78%|███████▊  | 1223/1563 [00:56<00:15, 22.66it/s]

batch 1220 loss: 0.3016982898116112


Train, Epoch 2 / 20:  79%|███████▉  | 1232/1563 [00:57<00:14, 23.05it/s]

batch 1230 loss: 0.28894910514354705


Train, Epoch 2 / 20:  80%|███████▉  | 1244/1563 [00:57<00:14, 22.68it/s]

batch 1240 loss: 0.3141320325434208


Train, Epoch 2 / 20:  80%|████████  | 1253/1563 [00:58<00:13, 22.37it/s]

batch 1250 loss: 0.28838362246751786


Train, Epoch 2 / 20:  81%|████████  | 1262/1563 [00:58<00:13, 22.46it/s]

batch 1260 loss: 0.22820012755692004


Train, Epoch 2 / 20:  82%|████████▏ | 1274/1563 [00:58<00:12, 22.32it/s]

batch 1270 loss: 0.283554445207119


Train, Epoch 2 / 20:  82%|████████▏ | 1283/1563 [00:59<00:12, 22.60it/s]

batch 1280 loss: 0.25093718618154526


Train, Epoch 2 / 20:  83%|████████▎ | 1292/1563 [00:59<00:12, 22.07it/s]

batch 1290 loss: 0.25340335443615913


Train, Epoch 2 / 20:  83%|████████▎ | 1304/1563 [01:00<00:11, 22.18it/s]

batch 1300 loss: 0.222282013297081


Train, Epoch 2 / 20:  84%|████████▍ | 1313/1563 [01:00<00:11, 22.01it/s]

batch 1310 loss: 0.31376891434192655


Train, Epoch 2 / 20:  85%|████████▍ | 1322/1563 [01:01<00:10, 22.05it/s]

batch 1320 loss: 0.2572066068649292


Train, Epoch 2 / 20:  85%|████████▌ | 1334/1563 [01:01<00:10, 21.96it/s]

batch 1330 loss: 0.23521390073001386


Train, Epoch 2 / 20:  86%|████████▌ | 1343/1563 [01:02<00:09, 22.12it/s]

batch 1340 loss: 0.30237500444054605


Train, Epoch 2 / 20:  87%|████████▋ | 1352/1563 [01:02<00:09, 22.48it/s]

batch 1350 loss: 0.3557967469096184


Train, Epoch 2 / 20:  87%|████████▋ | 1361/1563 [01:02<00:09, 21.36it/s]

batch 1360 loss: 0.26374293118715286


Train, Epoch 2 / 20:  88%|████████▊ | 1373/1563 [01:03<00:10, 18.60it/s]

batch 1370 loss: 0.2789446674287319


Train, Epoch 2 / 20:  88%|████████▊ | 1382/1563 [01:04<00:09, 19.08it/s]

batch 1380 loss: 0.28033297508955


Train, Epoch 2 / 20:  89%|████████▉ | 1392/1563 [01:04<00:09, 17.88it/s]

batch 1390 loss: 0.27865410000085833


Train, Epoch 2 / 20:  90%|████████▉ | 1402/1563 [01:05<00:07, 20.25it/s]

batch 1400 loss: 0.32407477796077727


Train, Epoch 2 / 20:  90%|█████████ | 1414/1563 [01:05<00:06, 22.30it/s]

batch 1410 loss: 0.3468532532453537


Train, Epoch 2 / 20:  91%|█████████ | 1423/1563 [01:06<00:06, 22.38it/s]

batch 1420 loss: 0.2961388774216175


Train, Epoch 2 / 20:  92%|█████████▏| 1432/1563 [01:06<00:05, 22.57it/s]

batch 1430 loss: 0.2568525642156601


Train, Epoch 2 / 20:  92%|█████████▏| 1444/1563 [01:07<00:05, 22.71it/s]

batch 1440 loss: 0.24769699648022653


Train, Epoch 2 / 20:  93%|█████████▎| 1453/1563 [01:07<00:04, 23.09it/s]

batch 1450 loss: 0.3803792968392372


Train, Epoch 2 / 20:  94%|█████████▎| 1462/1563 [01:07<00:04, 22.25it/s]

batch 1460 loss: 0.24317044988274575


Train, Epoch 2 / 20:  94%|█████████▍| 1474/1563 [01:08<00:03, 22.34it/s]

batch 1470 loss: 0.3130302205681801


Train, Epoch 2 / 20:  95%|█████████▍| 1483/1563 [01:08<00:03, 22.34it/s]

batch 1480 loss: 0.28410288617014884


Train, Epoch 2 / 20:  95%|█████████▌| 1492/1563 [01:09<00:03, 22.38it/s]

batch 1490 loss: 0.3020948007702827


Train, Epoch 2 / 20:  96%|█████████▌| 1504/1563 [01:09<00:02, 22.20it/s]

batch 1500 loss: 0.3359214052557945


Train, Epoch 2 / 20:  97%|█████████▋| 1513/1563 [01:10<00:02, 22.35it/s]

batch 1510 loss: 0.2439854472875595


Train, Epoch 2 / 20:  97%|█████████▋| 1522/1563 [01:10<00:01, 22.30it/s]

batch 1520 loss: 0.23662519827485085


Train, Epoch 2 / 20:  98%|█████████▊| 1534/1563 [01:11<00:01, 22.58it/s]

batch 1530 loss: 0.19853571206331252


Train, Epoch 2 / 20:  99%|█████████▊| 1543/1563 [01:11<00:00, 22.34it/s]

batch 1540 loss: 0.19930383041501046


Train, Epoch 2 / 20:  99%|█████████▉| 1552/1563 [01:11<00:00, 22.58it/s]

batch 1550 loss: 0.29684638530015944


Train, Epoch 2 / 20: 100%|██████████| 1563/1563 [01:12<00:00, 21.61it/s]


batch 1560 loss: 0.33290456533432006


Test, Epoch 2 / 20: 100%|██████████| 1563/1563 [00:36<00:00, 42.34it/s]


Epoch 2, loss: 0.4960297135561705, accuracy: 0.7998


Train, Epoch 3 / 20:   1%|          | 12/1563 [00:00<01:08, 22.58it/s]

batch 10 loss: 0.29785674512386323


Train, Epoch 3 / 20:   2%|▏         | 24/1563 [00:01<01:08, 22.56it/s]

batch 20 loss: 0.19872682690620422


Train, Epoch 3 / 20:   2%|▏         | 33/1563 [00:01<01:08, 22.33it/s]

batch 30 loss: 0.26921674236655235


Train, Epoch 3 / 20:   3%|▎         | 42/1563 [00:01<01:13, 20.83it/s]

batch 40 loss: 0.16956970021128653


Train, Epoch 3 / 20:   3%|▎         | 53/1563 [00:02<01:20, 18.72it/s]

batch 50 loss: 0.2594796925783157


Train, Epoch 3 / 20:   4%|▍         | 63/1563 [00:03<01:20, 18.58it/s]

batch 60 loss: 0.24548281356692314


Train, Epoch 3 / 20:   5%|▍         | 73/1563 [00:03<01:30, 16.53it/s]

batch 70 loss: 0.30062286034226415


Train, Epoch 3 / 20:   5%|▌         | 83/1563 [00:04<01:14, 19.74it/s]

batch 80 loss: 0.3015428282320499


Train, Epoch 3 / 20:   6%|▌         | 92/1563 [00:04<01:07, 21.64it/s]

batch 90 loss: 0.3056815631687641


Train, Epoch 3 / 20:   7%|▋         | 104/1563 [00:05<01:05, 22.36it/s]

batch 100 loss: 0.20946087017655374


Train, Epoch 3 / 20:   7%|▋         | 113/1563 [00:05<01:04, 22.51it/s]

batch 110 loss: 0.25788299888372423


Train, Epoch 3 / 20:   8%|▊         | 122/1563 [00:05<01:05, 22.17it/s]

batch 120 loss: 0.2197812855243683


Train, Epoch 3 / 20:   9%|▊         | 134/1563 [00:06<01:04, 22.29it/s]

batch 130 loss: 0.19703269377350807


Train, Epoch 3 / 20:   9%|▉         | 143/1563 [00:06<01:03, 22.34it/s]

batch 140 loss: 0.2929012455046177


Train, Epoch 3 / 20:  10%|▉         | 152/1563 [00:07<01:03, 22.17it/s]

batch 150 loss: 0.292148132622242


Train, Epoch 3 / 20:  10%|█         | 164/1563 [00:07<01:02, 22.49it/s]

batch 160 loss: 0.3179284997284412


Train, Epoch 3 / 20:  11%|█         | 173/1563 [00:08<01:01, 22.50it/s]

batch 170 loss: 0.2983833193778992


Train, Epoch 3 / 20:  12%|█▏        | 182/1563 [00:08<01:01, 22.63it/s]

batch 180 loss: 0.26680740490555765


Train, Epoch 3 / 20:  12%|█▏        | 194/1563 [00:09<01:00, 22.74it/s]

batch 190 loss: 0.24068550392985344


Train, Epoch 3 / 20:  13%|█▎        | 203/1563 [00:09<01:02, 21.74it/s]

batch 200 loss: 0.2682196479290724


Train, Epoch 3 / 20:  14%|█▎        | 212/1563 [00:09<01:01, 21.95it/s]

batch 210 loss: 0.3359205648303032


Train, Epoch 3 / 20:  14%|█▍        | 224/1563 [00:10<01:00, 22.06it/s]

batch 220 loss: 0.20950469747185707


Train, Epoch 3 / 20:  15%|█▍        | 233/1563 [00:10<01:00, 21.91it/s]

batch 230 loss: 0.2431492730975151


Train, Epoch 3 / 20:  15%|█▌        | 242/1563 [00:11<01:00, 21.89it/s]

batch 240 loss: 0.29350675642490387


Train, Epoch 3 / 20:  16%|█▋        | 254/1563 [00:11<00:58, 22.43it/s]

batch 250 loss: 0.3167313992977142


Train, Epoch 3 / 20:  17%|█▋        | 263/1563 [00:12<00:59, 21.98it/s]

batch 260 loss: 0.296737365424633


Train, Epoch 3 / 20:  17%|█▋        | 272/1563 [00:12<00:58, 22.23it/s]

batch 270 loss: 0.3433068037033081


Train, Epoch 3 / 20:  18%|█▊        | 284/1563 [00:13<00:58, 21.82it/s]

batch 280 loss: 0.25895831361413


Train, Epoch 3 / 20:  19%|█▊        | 293/1563 [00:13<00:57, 22.09it/s]

batch 290 loss: 0.1919553630053997


Train, Epoch 3 / 20:  19%|█▉        | 302/1563 [00:14<01:01, 20.45it/s]

batch 300 loss: 0.2520569637417793


Train, Epoch 3 / 20:  20%|██        | 313/1563 [00:14<01:06, 18.88it/s]

batch 310 loss: 0.24643508940935135


Train, Epoch 3 / 20:  21%|██        | 323/1563 [00:15<01:11, 17.25it/s]

batch 320 loss: 0.2229522258043289


Train, Epoch 3 / 20:  21%|██▏       | 333/1563 [00:15<01:13, 16.76it/s]

batch 330 loss: 0.36600095182657244


Train, Epoch 3 / 20:  22%|██▏       | 342/1563 [00:16<01:00, 20.33it/s]

batch 340 loss: 0.2510668374598026


Train, Epoch 3 / 20:  23%|██▎       | 354/1563 [00:16<00:54, 22.29it/s]

batch 350 loss: 0.2184263698756695


Train, Epoch 3 / 20:  23%|██▎       | 363/1563 [00:17<00:53, 22.41it/s]

batch 360 loss: 0.23337412849068642


Train, Epoch 3 / 20:  24%|██▍       | 372/1563 [00:17<00:53, 22.26it/s]

batch 370 loss: 0.25779023617506025


Train, Epoch 3 / 20:  25%|██▍       | 384/1563 [00:18<00:52, 22.64it/s]

batch 380 loss: 0.28959950506687165


Train, Epoch 3 / 20:  25%|██▌       | 393/1563 [00:18<00:51, 22.59it/s]

batch 390 loss: 0.25887863636016845


Train, Epoch 3 / 20:  26%|██▌       | 402/1563 [00:18<00:52, 22.27it/s]

batch 400 loss: 0.2965377628803253


Train, Epoch 3 / 20:  26%|██▋       | 414/1563 [00:19<00:51, 22.12it/s]

batch 410 loss: 0.3226799227297306


Train, Epoch 3 / 20:  27%|██▋       | 423/1563 [00:19<00:53, 21.20it/s]

batch 420 loss: 0.25937957465648653


Train, Epoch 3 / 20:  28%|██▊       | 432/1563 [00:20<00:51, 21.78it/s]

batch 430 loss: 0.25134042128920553


Train, Epoch 3 / 20:  28%|██▊       | 444/1563 [00:20<00:50, 22.36it/s]

batch 440 loss: 0.2571523755788803


Train, Epoch 3 / 20:  29%|██▉       | 453/1563 [00:21<00:48, 22.72it/s]

batch 450 loss: 0.2637930765748024


Train, Epoch 3 / 20:  30%|██▉       | 462/1563 [00:21<00:49, 22.24it/s]

batch 460 loss: 0.24436420127749442


Train, Epoch 3 / 20:  30%|███       | 474/1563 [00:22<00:49, 21.99it/s]

batch 470 loss: 0.24194196164608


Train, Epoch 3 / 20:  31%|███       | 483/1563 [00:22<00:49, 22.03it/s]

batch 480 loss: 0.20759119540452958


Train, Epoch 3 / 20:  31%|███▏      | 492/1563 [00:23<00:48, 21.98it/s]

batch 490 loss: 0.26055213809013367


Train, Epoch 3 / 20:  32%|███▏      | 504/1563 [00:23<00:47, 22.15it/s]

batch 500 loss: 0.26152369305491446


Train, Epoch 3 / 20:  33%|███▎      | 513/1563 [00:24<00:48, 21.67it/s]

batch 510 loss: 0.24998647794127465


Train, Epoch 3 / 20:  33%|███▎      | 522/1563 [00:24<00:48, 21.62it/s]

batch 520 loss: 0.2144767791032791


Train, Epoch 3 / 20:  34%|███▍      | 534/1563 [00:24<00:46, 22.03it/s]

batch 530 loss: 0.2528642266988754


Train, Epoch 3 / 20:  35%|███▍      | 543/1563 [00:25<00:46, 21.84it/s]

batch 540 loss: 0.27904205620288847


Train, Epoch 3 / 20:  35%|███▌      | 552/1563 [00:25<00:47, 21.34it/s]

batch 550 loss: 0.2607495725154877


Train, Epoch 3 / 20:  36%|███▌      | 561/1563 [00:26<00:49, 20.11it/s]

batch 560 loss: 0.27208952605724335


Train, Epoch 3 / 20:  37%|███▋      | 572/1563 [00:26<00:53, 18.65it/s]

batch 570 loss: 0.318104500323534


Train, Epoch 3 / 20:  37%|███▋      | 583/1563 [00:27<00:51, 18.99it/s]

batch 580 loss: 0.2972625754773617


Train, Epoch 3 / 20:  38%|███▊      | 593/1563 [00:28<00:53, 18.28it/s]

batch 590 loss: 0.21143655702471734


Train, Epoch 3 / 20:  39%|███▊      | 602/1563 [00:28<00:45, 21.21it/s]

batch 600 loss: 0.22410742118954657


Train, Epoch 3 / 20:  39%|███▉      | 614/1563 [00:28<00:43, 22.06it/s]

batch 610 loss: 0.222938072681427


Train, Epoch 3 / 20:  40%|███▉      | 623/1563 [00:29<00:41, 22.42it/s]

batch 620 loss: 0.20498760044574738


Train, Epoch 3 / 20:  40%|████      | 632/1563 [00:29<00:41, 22.24it/s]

batch 630 loss: 0.2195245735347271


Train, Epoch 3 / 20:  41%|████      | 644/1563 [00:30<00:40, 22.69it/s]

batch 640 loss: 0.30043947100639345


Train, Epoch 3 / 20:  42%|████▏     | 653/1563 [00:30<00:40, 22.56it/s]

batch 650 loss: 0.2113979872316122


Train, Epoch 3 / 20:  42%|████▏     | 662/1563 [00:31<00:40, 22.48it/s]

batch 660 loss: 0.28982909470796586


Train, Epoch 3 / 20:  43%|████▎     | 674/1563 [00:31<00:39, 22.30it/s]

batch 670 loss: 0.28062627241015436


Train, Epoch 3 / 20:  44%|████▎     | 683/1563 [00:32<00:39, 22.25it/s]

batch 680 loss: 0.2891711801290512


Train, Epoch 3 / 20:  44%|████▍     | 692/1563 [00:32<00:39, 22.10it/s]

batch 690 loss: 0.24263158738613128


Train, Epoch 3 / 20:  45%|████▌     | 704/1563 [00:32<00:37, 22.64it/s]

batch 700 loss: 0.2068730529397726


Train, Epoch 3 / 20:  46%|████▌     | 713/1563 [00:33<00:37, 22.42it/s]

batch 710 loss: 0.3003786887973547


Train, Epoch 3 / 20:  46%|████▌     | 722/1563 [00:33<00:38, 22.10it/s]

batch 720 loss: 0.4005410552024841


Train, Epoch 3 / 20:  47%|████▋     | 734/1563 [00:34<00:37, 21.89it/s]

batch 730 loss: 0.34049228951334953


Train, Epoch 3 / 20:  48%|████▊     | 743/1563 [00:34<00:36, 22.19it/s]

batch 740 loss: 0.20657144598662852


Train, Epoch 3 / 20:  48%|████▊     | 752/1563 [00:35<00:36, 22.21it/s]

batch 750 loss: 0.3033738821744919


Train, Epoch 3 / 20:  49%|████▉     | 764/1563 [00:35<00:35, 22.50it/s]

batch 760 loss: 0.23164833709597588


Train, Epoch 3 / 20:  49%|████▉     | 773/1563 [00:36<00:35, 22.46it/s]

batch 770 loss: 0.3076351471245289


Train, Epoch 3 / 20:  50%|█████     | 782/1563 [00:36<00:35, 22.14it/s]

batch 780 loss: 0.2677625298500061


Train, Epoch 3 / 20:  51%|█████     | 794/1563 [00:37<00:34, 22.33it/s]

batch 790 loss: 0.31397735327482224


Train, Epoch 3 / 20:  51%|█████▏    | 803/1563 [00:37<00:33, 22.51it/s]

batch 800 loss: 0.24666507691144943


Train, Epoch 3 / 20:  52%|█████▏    | 812/1563 [00:37<00:33, 22.41it/s]

batch 810 loss: 0.2406129091978073


Train, Epoch 3 / 20:  53%|█████▎    | 821/1563 [00:38<00:36, 20.14it/s]

batch 820 loss: 0.3555129706859589


Train, Epoch 3 / 20:  53%|█████▎    | 832/1563 [00:38<00:38, 19.04it/s]

batch 830 loss: 0.3005788832902908


Train, Epoch 3 / 20:  54%|█████▍    | 843/1563 [00:39<00:39, 18.09it/s]

batch 840 loss: 0.30603174716234205


Train, Epoch 3 / 20:  55%|█████▍    | 853/1563 [00:40<00:41, 16.96it/s]

batch 850 loss: 0.29684182405471804


Train, Epoch 3 / 20:  55%|█████▌    | 862/1563 [00:40<00:34, 20.58it/s]

batch 860 loss: 0.3311197578907013


Train, Epoch 3 / 20:  56%|█████▌    | 874/1563 [00:41<00:31, 21.54it/s]

batch 870 loss: 0.33495764285326


Train, Epoch 3 / 20:  56%|█████▋    | 883/1563 [00:41<00:31, 21.73it/s]

batch 880 loss: 0.2275110550224781


Train, Epoch 3 / 20:  57%|█████▋    | 892/1563 [00:41<00:30, 22.07it/s]

batch 890 loss: 0.269349979609251


Train, Epoch 3 / 20:  58%|█████▊    | 904/1563 [00:42<00:29, 22.07it/s]

batch 900 loss: 0.22874584048986435


Train, Epoch 3 / 20:  58%|█████▊    | 913/1563 [00:42<00:29, 21.97it/s]

batch 910 loss: 0.25827106535434724


Train, Epoch 3 / 20:  59%|█████▉    | 922/1563 [00:43<00:29, 21.82it/s]

batch 920 loss: 0.2393836636096239


Train, Epoch 3 / 20:  60%|█████▉    | 934/1563 [00:43<00:27, 22.69it/s]

batch 930 loss: 0.254123966768384


Train, Epoch 3 / 20:  60%|██████    | 943/1563 [00:44<00:27, 22.22it/s]

batch 940 loss: 0.2475224021822214


Train, Epoch 3 / 20:  61%|██████    | 952/1563 [00:44<00:27, 22.41it/s]

batch 950 loss: 0.2856183469295502


Train, Epoch 3 / 20:  62%|██████▏   | 964/1563 [00:45<00:26, 22.37it/s]

batch 960 loss: 0.2590852230787277


Train, Epoch 3 / 20:  62%|██████▏   | 973/1563 [00:45<00:26, 22.62it/s]

batch 970 loss: 0.2086111158132553


Train, Epoch 3 / 20:  63%|██████▎   | 982/1563 [00:45<00:25, 22.59it/s]

batch 980 loss: 0.3274644397199154


Train, Epoch 3 / 20:  64%|██████▎   | 994/1563 [00:46<00:25, 22.58it/s]

batch 990 loss: 0.2561994355171919


Train, Epoch 3 / 20:  64%|██████▍   | 1003/1563 [00:46<00:24, 22.70it/s]

batch 1000 loss: 0.26164126992225645


Train, Epoch 3 / 20:  65%|██████▍   | 1012/1563 [00:47<00:24, 22.21it/s]

batch 1010 loss: 0.2694708302617073


Train, Epoch 3 / 20:  66%|██████▌   | 1024/1563 [00:47<00:23, 22.56it/s]

batch 1020 loss: 0.2722099140286446


Train, Epoch 3 / 20:  66%|██████▌   | 1033/1563 [00:48<00:23, 22.23it/s]

batch 1030 loss: 0.32538274452090266


Train, Epoch 3 / 20:  67%|██████▋   | 1042/1563 [00:48<00:23, 22.44it/s]

batch 1040 loss: 0.32192985713481903


Train, Epoch 3 / 20:  67%|██████▋   | 1054/1563 [00:49<00:22, 22.57it/s]

batch 1050 loss: 0.2886475145816803


Train, Epoch 3 / 20:  68%|██████▊   | 1063/1563 [00:49<00:22, 22.42it/s]

batch 1060 loss: 0.27794342413544654


Train, Epoch 3 / 20:  69%|██████▊   | 1072/1563 [00:49<00:21, 22.35it/s]

batch 1070 loss: 0.27020236626267435


Train, Epoch 3 / 20:  69%|██████▉   | 1083/1563 [00:50<00:24, 19.33it/s]

batch 1080 loss: 0.25221110582351686


Train, Epoch 3 / 20:  70%|██████▉   | 1092/1563 [00:50<00:24, 19.05it/s]

batch 1090 loss: 0.300323898345232


Train, Epoch 3 / 20:  71%|███████   | 1102/1563 [00:51<00:26, 17.33it/s]

batch 1100 loss: 0.22739287465810776


Train, Epoch 3 / 20:  71%|███████   | 1112/1563 [00:52<00:25, 17.72it/s]

batch 1110 loss: 0.29780822098255155


Train, Epoch 3 / 20:  72%|███████▏  | 1123/1563 [00:52<00:21, 20.85it/s]

batch 1120 loss: 0.2839900977909565


Train, Epoch 3 / 20:  72%|███████▏  | 1132/1563 [00:53<00:19, 21.88it/s]

batch 1130 loss: 0.25748620107769965


Train, Epoch 3 / 20:  73%|███████▎  | 1144/1563 [00:53<00:18, 22.42it/s]

batch 1140 loss: 0.27371296361088754


Train, Epoch 3 / 20:  74%|███████▍  | 1153/1563 [00:53<00:18, 22.71it/s]

batch 1150 loss: 0.25088932141661646


Train, Epoch 3 / 20:  74%|███████▍  | 1162/1563 [00:54<00:17, 22.28it/s]

batch 1160 loss: 0.27327931821346285


Train, Epoch 3 / 20:  75%|███████▌  | 1174/1563 [00:54<00:17, 21.89it/s]

batch 1170 loss: 0.2811339393258095


Train, Epoch 3 / 20:  76%|███████▌  | 1183/1563 [00:55<00:16, 22.58it/s]

batch 1180 loss: 0.19756582751870155


Train, Epoch 3 / 20:  76%|███████▋  | 1192/1563 [00:55<00:16, 22.50it/s]

batch 1190 loss: 0.28829085528850557


Train, Epoch 3 / 20:  77%|███████▋  | 1204/1563 [00:56<00:15, 22.66it/s]

batch 1200 loss: 0.23202174082398414


Train, Epoch 3 / 20:  78%|███████▊  | 1213/1563 [00:56<00:15, 22.49it/s]

batch 1210 loss: 0.3729476109147072


Train, Epoch 3 / 20:  78%|███████▊  | 1222/1563 [00:57<00:15, 22.50it/s]

batch 1220 loss: 0.32428728193044665


Train, Epoch 3 / 20:  79%|███████▉  | 1234/1563 [00:57<00:14, 22.31it/s]

batch 1230 loss: 0.2634511739015579


Train, Epoch 3 / 20:  80%|███████▉  | 1243/1563 [00:57<00:14, 22.74it/s]

batch 1240 loss: 0.2768839649856091


Train, Epoch 3 / 20:  80%|████████  | 1252/1563 [00:58<00:14, 22.16it/s]

batch 1250 loss: 0.1532752588391304


Train, Epoch 3 / 20:  81%|████████  | 1264/1563 [00:58<00:13, 22.57it/s]

batch 1260 loss: 0.26552932262420653


Train, Epoch 3 / 20:  81%|████████▏ | 1273/1563 [00:59<00:12, 22.48it/s]

batch 1270 loss: 0.22165549397468567


Train, Epoch 3 / 20:  82%|████████▏ | 1282/1563 [00:59<00:12, 22.58it/s]

batch 1280 loss: 0.21826687455177307


Train, Epoch 3 / 20:  83%|████████▎ | 1294/1563 [01:00<00:11, 22.94it/s]

batch 1290 loss: 0.2539468500763178


Train, Epoch 3 / 20:  83%|████████▎ | 1303/1563 [01:00<00:11, 22.37it/s]

batch 1300 loss: 0.24894679263234137


Train, Epoch 3 / 20:  84%|████████▍ | 1312/1563 [01:01<00:11, 22.53it/s]

batch 1310 loss: 0.2297954060137272


Train, Epoch 3 / 20:  85%|████████▍ | 1324/1563 [01:01<00:10, 22.35it/s]

batch 1320 loss: 0.29693654328584673


Train, Epoch 3 / 20:  85%|████████▌ | 1333/1563 [01:01<00:10, 22.36it/s]

batch 1330 loss: 0.1962319415062666


Train, Epoch 3 / 20:  86%|████████▌ | 1342/1563 [01:02<00:10, 20.18it/s]

batch 1340 loss: 0.2735157899558544


Train, Epoch 3 / 20:  87%|████████▋ | 1353/1563 [01:03<00:10, 19.33it/s]

batch 1350 loss: 0.38624860942363737


Train, Epoch 3 / 20:  87%|████████▋ | 1363/1563 [01:03<00:11, 17.39it/s]

batch 1360 loss: 0.3209304705262184


Train, Epoch 3 / 20:  88%|████████▊ | 1373/1563 [01:04<00:10, 17.43it/s]

batch 1370 loss: 0.2565653555095196


Train, Epoch 3 / 20:  88%|████████▊ | 1382/1563 [01:04<00:08, 20.59it/s]

batch 1380 loss: 0.2502398818731308


Train, Epoch 3 / 20:  89%|████████▉ | 1394/1563 [01:05<00:07, 21.84it/s]

batch 1390 loss: 0.27810858115553855


Train, Epoch 3 / 20:  90%|████████▉ | 1403/1563 [01:05<00:07, 22.05it/s]

batch 1400 loss: 0.3062055006623268


Train, Epoch 3 / 20:  90%|█████████ | 1412/1563 [01:05<00:06, 22.29it/s]

batch 1410 loss: 0.23016418814659118


Train, Epoch 3 / 20:  91%|█████████ | 1424/1563 [01:06<00:06, 22.69it/s]

batch 1420 loss: 0.31280737891793253


Train, Epoch 3 / 20:  92%|█████████▏| 1433/1563 [01:06<00:05, 22.88it/s]

batch 1430 loss: 0.2519353456795216


Train, Epoch 3 / 20:  92%|█████████▏| 1442/1563 [01:07<00:05, 22.47it/s]

batch 1440 loss: 0.1954238723963499


Train, Epoch 3 / 20:  93%|█████████▎| 1454/1563 [01:07<00:04, 22.82it/s]

batch 1450 loss: 0.33046996742486956


Train, Epoch 3 / 20:  94%|█████████▎| 1463/1563 [01:08<00:04, 22.57it/s]

batch 1460 loss: 0.24244313016533853


Train, Epoch 3 / 20:  94%|█████████▍| 1472/1563 [01:08<00:04, 22.51it/s]

batch 1470 loss: 0.30511624366045


Train, Epoch 3 / 20:  95%|█████████▍| 1484/1563 [01:09<00:03, 22.20it/s]

batch 1480 loss: 0.36934852600097656


Train, Epoch 3 / 20:  96%|█████████▌| 1493/1563 [01:09<00:03, 22.00it/s]

batch 1490 loss: 0.24818959310650826


Train, Epoch 3 / 20:  96%|█████████▌| 1502/1563 [01:10<00:02, 21.87it/s]

batch 1500 loss: 0.27390529960393906


Train, Epoch 3 / 20:  97%|█████████▋| 1514/1563 [01:10<00:02, 22.09it/s]

batch 1510 loss: 0.19826358780264855


Train, Epoch 3 / 20:  97%|█████████▋| 1523/1563 [01:10<00:01, 22.05it/s]

batch 1520 loss: 0.33200887590646744


Train, Epoch 3 / 20:  98%|█████████▊| 1532/1563 [01:11<00:01, 22.23it/s]

batch 1530 loss: 0.23633320406079292


Train, Epoch 3 / 20:  99%|█████████▉| 1544/1563 [01:11<00:00, 22.37it/s]

batch 1540 loss: 0.2267143800854683


Train, Epoch 3 / 20:  99%|█████████▉| 1553/1563 [01:12<00:00, 22.20it/s]

batch 1550 loss: 0.21090799383819103


Train, Epoch 3 / 20: 100%|██████████| 1563/1563 [01:12<00:00, 21.48it/s]


batch 1560 loss: 0.2673740781843662


Test, Epoch 3 / 20: 100%|██████████| 1563/1563 [00:36<00:00, 42.27it/s]


Epoch 3, loss: 0.4956076192457974, accuracy: 0.80368


Train, Epoch 4 / 20:   1%|          | 14/1563 [00:00<01:08, 22.52it/s]

batch 10 loss: 0.2114180400967598


Train, Epoch 4 / 20:   1%|▏         | 23/1563 [00:01<01:18, 19.70it/s]

batch 20 loss: 0.27568153440952303


Train, Epoch 4 / 20:   2%|▏         | 33/1563 [00:01<01:24, 18.07it/s]

batch 30 loss: 0.2359623372554779


Train, Epoch 4 / 20:   3%|▎         | 43/1563 [00:02<01:29, 16.93it/s]

batch 40 loss: 0.2197956696152687


Train, Epoch 4 / 20:   3%|▎         | 53/1563 [00:02<01:27, 17.20it/s]

batch 50 loss: 0.1764220677316189


Train, Epoch 4 / 20:   4%|▍         | 64/1563 [00:03<01:10, 21.21it/s]

batch 60 loss: 0.3354602545499802


Train, Epoch 4 / 20:   5%|▍         | 73/1563 [00:03<01:07, 22.05it/s]

batch 70 loss: 0.1611011616885662


Train, Epoch 4 / 20:   5%|▌         | 82/1563 [00:04<01:06, 22.18it/s]

batch 80 loss: 0.25931706577539443


Train, Epoch 4 / 20:   6%|▌         | 94/1563 [00:04<01:06, 22.01it/s]

batch 90 loss: 0.20729373544454574


Train, Epoch 4 / 20:   7%|▋         | 103/1563 [00:05<01:05, 22.27it/s]

batch 100 loss: 0.23019708022475244


Train, Epoch 4 / 20:   7%|▋         | 112/1563 [00:05<01:05, 22.17it/s]

batch 110 loss: 0.3045224115252495


Train, Epoch 4 / 20:   8%|▊         | 124/1563 [00:06<01:04, 22.40it/s]

batch 120 loss: 0.2751887507736683


Train, Epoch 4 / 20:   9%|▊         | 133/1563 [00:06<01:03, 22.52it/s]

batch 130 loss: 0.2551283299922943


Train, Epoch 4 / 20:   9%|▉         | 142/1563 [00:06<01:03, 22.54it/s]

batch 140 loss: 0.22799245417118072


Train, Epoch 4 / 20:  10%|▉         | 154/1563 [00:07<01:02, 22.60it/s]

batch 150 loss: 0.30223949924111365


Train, Epoch 4 / 20:  10%|█         | 163/1563 [00:07<01:02, 22.47it/s]

batch 160 loss: 0.22472127489745616


Train, Epoch 4 / 20:  11%|█         | 172/1563 [00:08<01:02, 22.17it/s]

batch 170 loss: 0.27529462054371834


Train, Epoch 4 / 20:  12%|█▏        | 184/1563 [00:08<01:01, 22.31it/s]

batch 180 loss: 0.23518242165446282


Train, Epoch 4 / 20:  12%|█▏        | 193/1563 [00:09<01:00, 22.66it/s]

batch 190 loss: 0.23739858940243722


Train, Epoch 4 / 20:  13%|█▎        | 202/1563 [00:09<01:01, 22.22it/s]

batch 200 loss: 0.2154719065874815


Train, Epoch 4 / 20:  14%|█▎        | 214/1563 [00:10<00:59, 22.62it/s]

batch 210 loss: 0.2517849698662758


Train, Epoch 4 / 20:  14%|█▍        | 223/1563 [00:10<01:00, 22.30it/s]

batch 220 loss: 0.266401095688343


Train, Epoch 4 / 20:  15%|█▍        | 232/1563 [00:10<00:59, 22.51it/s]

batch 230 loss: 0.28278440460562704


Train, Epoch 4 / 20:  16%|█▌        | 244/1563 [00:11<00:58, 22.45it/s]

batch 240 loss: 0.254184877127409


Train, Epoch 4 / 20:  16%|█▌        | 253/1563 [00:11<00:58, 22.42it/s]

batch 250 loss: 0.21381597742438316


Train, Epoch 4 / 20:  17%|█▋        | 262/1563 [00:12<00:58, 22.31it/s]

batch 260 loss: 0.20321406945586204


Train, Epoch 4 / 20:  18%|█▊        | 274/1563 [00:12<00:57, 22.35it/s]

batch 270 loss: 0.24953263029456138


Train, Epoch 4 / 20:  18%|█▊        | 283/1563 [00:13<01:04, 19.86it/s]

batch 280 loss: 0.2705467872321606


Train, Epoch 4 / 20:  19%|█▊        | 292/1563 [00:13<01:10, 18.09it/s]

batch 290 loss: 0.29283793121576307


Train, Epoch 4 / 20:  19%|█▉        | 302/1563 [00:14<01:10, 17.88it/s]

batch 300 loss: 0.23488210886716843


Train, Epoch 4 / 20:  20%|█▉        | 312/1563 [00:14<01:12, 17.35it/s]

batch 310 loss: 0.26844092160463334


Train, Epoch 4 / 20:  21%|██        | 324/1563 [00:15<00:57, 21.67it/s]

batch 320 loss: 0.21409795805811882


Train, Epoch 4 / 20:  21%|██▏       | 333/1563 [00:15<00:56, 21.94it/s]

batch 330 loss: 0.2876598834991455


Train, Epoch 4 / 20:  22%|██▏       | 342/1563 [00:16<00:54, 22.36it/s]

batch 340 loss: 0.2476534366607666


Train, Epoch 4 / 20:  23%|██▎       | 354/1563 [00:16<00:54, 22.38it/s]

batch 350 loss: 0.17307529039680958


Train, Epoch 4 / 20:  23%|██▎       | 363/1563 [00:17<00:54, 22.18it/s]

batch 360 loss: 0.19914351999759675


Train, Epoch 4 / 20:  24%|██▍       | 372/1563 [00:17<00:52, 22.53it/s]

batch 370 loss: 0.2937754034996033


Train, Epoch 4 / 20:  25%|██▍       | 384/1563 [00:18<00:52, 22.28it/s]

batch 380 loss: 0.23202825859189033


Train, Epoch 4 / 20:  25%|██▌       | 393/1563 [00:18<00:51, 22.53it/s]

batch 390 loss: 0.1999623842537403


Train, Epoch 4 / 20:  26%|██▌       | 402/1563 [00:18<00:51, 22.35it/s]

batch 400 loss: 0.27812790423631667


Train, Epoch 4 / 20:  26%|██▋       | 414/1563 [00:19<00:50, 22.79it/s]

batch 410 loss: 0.3251990512013435


Train, Epoch 4 / 20:  27%|██▋       | 423/1563 [00:19<00:50, 22.56it/s]

batch 420 loss: 0.24461451023817063


Train, Epoch 4 / 20:  28%|██▊       | 432/1563 [00:20<00:50, 22.61it/s]

batch 430 loss: 0.2520031169056892


Train, Epoch 4 / 20:  28%|██▊       | 444/1563 [00:20<00:50, 22.03it/s]

batch 440 loss: 0.24027770161628723


Train, Epoch 4 / 20:  29%|██▉       | 453/1563 [00:21<00:49, 22.28it/s]

batch 450 loss: 0.24778163135051728


Train, Epoch 4 / 20:  30%|██▉       | 462/1563 [00:21<00:49, 22.22it/s]

batch 460 loss: 0.27563012316823005


Train, Epoch 4 / 20:  30%|███       | 474/1563 [00:22<00:48, 22.45it/s]

batch 470 loss: 0.28310339748859403


Train, Epoch 4 / 20:  31%|███       | 483/1563 [00:22<00:48, 22.50it/s]

batch 480 loss: 0.2431316129863262


Train, Epoch 4 / 20:  31%|███▏      | 492/1563 [00:22<00:48, 22.28it/s]

batch 490 loss: 0.17001896314322948


Train, Epoch 4 / 20:  32%|███▏      | 504/1563 [00:23<00:47, 22.26it/s]

batch 500 loss: 0.2407403141260147


Train, Epoch 4 / 20:  33%|███▎      | 513/1563 [00:23<00:47, 22.12it/s]

batch 510 loss: 0.3736498787999153


Train, Epoch 4 / 20:  33%|███▎      | 522/1563 [00:24<00:47, 22.08it/s]

batch 520 loss: 0.2621198922395706


Train, Epoch 4 / 20:  34%|███▍      | 534/1563 [00:24<00:45, 22.49it/s]

batch 530 loss: 0.2508090779185295


Train, Epoch 4 / 20:  35%|███▍      | 543/1563 [00:25<00:50, 20.21it/s]

batch 540 loss: 0.2578807003796101


Train, Epoch 4 / 20:  35%|███▌      | 552/1563 [00:25<00:55, 18.37it/s]

batch 550 loss: 0.21495917066931725


Train, Epoch 4 / 20:  36%|███▌      | 562/1563 [00:26<00:55, 17.93it/s]

batch 560 loss: 0.20849285125732422


Train, Epoch 4 / 20:  37%|███▋      | 572/1563 [00:26<00:56, 17.67it/s]

batch 570 loss: 0.26052985787391664


Train, Epoch 4 / 20:  37%|███▋      | 584/1563 [00:27<00:46, 21.08it/s]

batch 580 loss: 0.2524506576359272


Train, Epoch 4 / 20:  38%|███▊      | 593/1563 [00:27<00:44, 21.78it/s]

batch 590 loss: 0.26815901696681976


Train, Epoch 4 / 20:  38%|███▊      | 598/1563 [00:28<00:45, 21.26it/s]


KeyboardInterrupt: 

In [27]:
from torch.utils.data import DataLoader

naive_moe_config = PretrainedConfig(
    **base_config,
    num_experts=4,
    capacity_factor=2.0,
    num_experts_per_token=1,
    ff_cls=NaiveMoE
)

train_loader = DataLoader(tokenized_dataset['train'], batch_size=16, shuffle=True)
test_loader = DataLoader(tokenized_dataset['test'], batch_size=16, shuffle=False)

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
# model = TransformerClassifier(naive_moe_config).to(DEVICE)
model = TransformerClassifier(standard_config).to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=5e-5)

In [28]:
from tqdm import tqdm

NUM_OF_EPOCHS = 20

for epoch in range(NUM_OF_EPOCHS):
    model.train()
    train_progress_bar = tqdm(train_loader, desc=f'Train, Epoch {epoch + 1} / {NUM_OF_EPOCHS}')
    running_loss = 0.
    for i, batch in enumerate(train_progress_bar):
        x, y = batch['input_ids'], batch['label']
        x = torch.stack(x, dim=1).to(DEVICE)
        y = y.to(DEVICE)
        optimizer.zero_grad()
        loss = model(x, y)['loss']
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        if i % 10 == 9:
            last_loss = running_loss / 10 # avg loss per batch
            print('batch {} loss: {}'.format(i + 1, last_loss))
            running_loss = 0.

    model.eval()
    with torch.no_grad():
        total_loss = 0
        total_samples = 0
        correct_samples = 0
        test_progress_bar = tqdm(test_loader, desc=f'Test, Epoch {epoch + 1} / {NUM_OF_EPOCHS}')
        for batch in test_progress_bar:
            x, y = batch['input_ids'], batch['label']
            x = torch.stack(x, dim=1).to(DEVICE)
            y = y.to(DEVICE)
            logits = model(x)['logits']
            total_loss += F.cross_entropy(logits, y, reduction='sum').item()
            total_samples += y.shape[0]
            correct_samples += (logits.argmax(dim=-1) == y).sum().item()

        print(f'Epoch {epoch + 1}, loss: {total_loss / total_samples}, accuracy: {correct_samples / total_samples}')

Train, Epoch 1 / 20:   1%|          | 14/1563 [00:00<00:46, 33.44it/s]

batch 10 loss: 0.8985028684139251


Train, Epoch 1 / 20:   2%|▏         | 26/1563 [00:00<00:42, 36.22it/s]

batch 20 loss: 0.745098939538002


Train, Epoch 1 / 20:   2%|▏         | 34/1563 [00:00<00:43, 35.02it/s]

batch 30 loss: 0.7758063852787018


Train, Epoch 1 / 20:   3%|▎         | 46/1563 [00:01<00:42, 36.09it/s]

batch 40 loss: 0.8223558843135834


Train, Epoch 1 / 20:   3%|▎         | 54/1563 [00:01<00:41, 36.43it/s]

batch 50 loss: 0.7013353109359741


Train, Epoch 1 / 20:   4%|▍         | 66/1563 [00:01<00:40, 36.97it/s]

batch 60 loss: 0.7542852997779846


Train, Epoch 1 / 20:   5%|▍         | 74/1563 [00:02<00:40, 36.72it/s]

batch 70 loss: 0.7141084194183349


Train, Epoch 1 / 20:   6%|▌         | 86/1563 [00:02<00:39, 37.44it/s]

batch 80 loss: 0.7132517576217652


Train, Epoch 1 / 20:   6%|▌         | 94/1563 [00:02<00:39, 37.20it/s]

batch 90 loss: 0.6994368314743042


Train, Epoch 1 / 20:   7%|▋         | 106/1563 [00:02<00:39, 36.70it/s]

batch 100 loss: 0.7003930032253265


Train, Epoch 1 / 20:   7%|▋         | 114/1563 [00:03<00:38, 37.45it/s]

batch 110 loss: 0.6601896047592163


Train, Epoch 1 / 20:   8%|▊         | 126/1563 [00:03<00:38, 37.80it/s]

batch 120 loss: 0.6906091868877411


Train, Epoch 1 / 20:   9%|▊         | 134/1563 [00:03<00:38, 37.14it/s]

batch 130 loss: 0.7012782514095306


Train, Epoch 1 / 20:   9%|▉         | 146/1563 [00:04<00:37, 37.30it/s]

batch 140 loss: 0.7099199831485749


Train, Epoch 1 / 20:  10%|▉         | 154/1563 [00:04<00:37, 37.60it/s]

batch 150 loss: 0.7493707537651062


Train, Epoch 1 / 20:  11%|█         | 166/1563 [00:04<00:37, 37.69it/s]

batch 160 loss: 0.65621896982193


Train, Epoch 1 / 20:  11%|█         | 174/1563 [00:04<00:37, 36.57it/s]

batch 170 loss: 0.7028757214546204


Train, Epoch 1 / 20:  12%|█▏        | 186/1563 [00:05<00:40, 34.19it/s]

batch 180 loss: 0.6985395789146424


Train, Epoch 1 / 20:  12%|█▏        | 194/1563 [00:05<00:40, 33.98it/s]

batch 190 loss: 0.6915748298168183


Train, Epoch 1 / 20:  13%|█▎        | 206/1563 [00:05<00:40, 33.84it/s]

batch 200 loss: 0.7130996346473694


Train, Epoch 1 / 20:  14%|█▎        | 214/1563 [00:05<00:40, 33.07it/s]

batch 210 loss: 0.7539804875850677


Train, Epoch 1 / 20:  14%|█▍        | 226/1563 [00:06<00:40, 33.30it/s]

batch 220 loss: 0.8040502071380615


Train, Epoch 1 / 20:  15%|█▍        | 234/1563 [00:06<00:40, 33.15it/s]

batch 230 loss: 0.6730391949415206


Train, Epoch 1 / 20:  16%|█▌        | 246/1563 [00:06<00:37, 35.43it/s]

batch 240 loss: 0.6935271799564362


Train, Epoch 1 / 20:  16%|█▋        | 254/1563 [00:07<00:36, 36.16it/s]

batch 250 loss: 0.6453969180583954


Train, Epoch 1 / 20:  17%|█▋        | 266/1563 [00:07<00:35, 37.02it/s]

batch 260 loss: 0.7218855202198029


Train, Epoch 1 / 20:  18%|█▊        | 274/1563 [00:07<00:35, 36.60it/s]

batch 270 loss: 0.6939124345779419


Train, Epoch 1 / 20:  18%|█▊        | 286/1563 [00:07<00:34, 36.65it/s]

batch 280 loss: 0.6858639776706695


Train, Epoch 1 / 20:  19%|█▉        | 294/1563 [00:08<00:34, 36.94it/s]

batch 290 loss: 0.6641269445419311


Train, Epoch 1 / 20:  20%|█▉        | 306/1563 [00:08<00:33, 37.42it/s]

batch 300 loss: 0.6687982082366943


Train, Epoch 1 / 20:  20%|██        | 314/1563 [00:08<00:33, 37.14it/s]

batch 310 loss: 0.6910812020301819


Train, Epoch 1 / 20:  21%|██        | 326/1563 [00:09<00:33, 37.07it/s]

batch 320 loss: 0.6827877581119537


Train, Epoch 1 / 20:  21%|██▏       | 334/1563 [00:09<00:33, 36.52it/s]

batch 330 loss: 0.683169162273407


Train, Epoch 1 / 20:  22%|██▏       | 346/1563 [00:09<00:33, 36.25it/s]

batch 340 loss: 0.6791384398937226


Train, Epoch 1 / 20:  23%|██▎       | 354/1563 [00:09<00:32, 37.09it/s]

batch 350 loss: 0.7116446614265441


Train, Epoch 1 / 20:  23%|██▎       | 366/1563 [00:10<00:32, 36.54it/s]

batch 360 loss: 0.7649791300296783


Train, Epoch 1 / 20:  24%|██▍       | 374/1563 [00:10<00:32, 36.40it/s]

batch 370 loss: 0.7319849848747253


Train, Epoch 1 / 20:  25%|██▍       | 386/1563 [00:10<00:32, 36.29it/s]

batch 380 loss: 0.6537059128284455


Train, Epoch 1 / 20:  25%|██▌       | 394/1563 [00:10<00:31, 36.89it/s]

batch 390 loss: 0.6740696668624878


Train, Epoch 1 / 20:  26%|██▌       | 406/1563 [00:11<00:31, 37.32it/s]

batch 400 loss: 0.6502443432807923


Train, Epoch 1 / 20:  26%|██▋       | 414/1563 [00:11<00:31, 37.05it/s]

batch 410 loss: 0.6658796727657318


Train, Epoch 1 / 20:  27%|██▋       | 426/1563 [00:11<00:30, 37.09it/s]

batch 420 loss: 0.6713829159736633


Train, Epoch 1 / 20:  28%|██▊       | 434/1563 [00:12<00:30, 37.32it/s]

batch 430 loss: 0.6899772822856903


Train, Epoch 1 / 20:  29%|██▊       | 446/1563 [00:12<00:30, 36.36it/s]

batch 440 loss: 0.7122270107269287


Train, Epoch 1 / 20:  29%|██▉       | 454/1563 [00:12<00:30, 36.24it/s]

batch 450 loss: 0.6825970649719239


Train, Epoch 1 / 20:  30%|██▉       | 466/1563 [00:12<00:29, 37.05it/s]

batch 460 loss: 0.6992752909660339


Train, Epoch 1 / 20:  30%|███       | 474/1563 [00:13<00:29, 37.19it/s]

batch 470 loss: 0.697822168469429


Train, Epoch 1 / 20:  31%|███       | 486/1563 [00:13<00:28, 37.61it/s]

batch 480 loss: 0.6586163878440857


Train, Epoch 1 / 20:  32%|███▏      | 494/1563 [00:13<00:28, 37.57it/s]

batch 490 loss: 0.6738010227680207


Train, Epoch 1 / 20:  32%|███▏      | 506/1563 [00:13<00:28, 37.72it/s]

batch 500 loss: 0.6882218778133392


Train, Epoch 1 / 20:  33%|███▎      | 514/1563 [00:14<00:28, 37.34it/s]

batch 510 loss: 0.6656189918518066


Train, Epoch 1 / 20:  34%|███▎      | 526/1563 [00:14<00:28, 36.79it/s]

batch 520 loss: 0.6677110910415649


Train, Epoch 1 / 20:  34%|███▍      | 534/1563 [00:14<00:28, 36.50it/s]

batch 530 loss: 0.6773120343685151


Train, Epoch 1 / 20:  35%|███▍      | 546/1563 [00:15<00:27, 37.04it/s]

batch 540 loss: 0.6588823199272156


Train, Epoch 1 / 20:  35%|███▌      | 554/1563 [00:15<00:27, 37.11it/s]

batch 550 loss: 0.6598330855369567


Train, Epoch 1 / 20:  36%|███▌      | 566/1563 [00:15<00:26, 37.26it/s]

batch 560 loss: 0.6208869278430938


Train, Epoch 1 / 20:  37%|███▋      | 574/1563 [00:15<00:26, 37.26it/s]

batch 570 loss: 0.6367771208286286


Train, Epoch 1 / 20:  37%|███▋      | 586/1563 [00:16<00:26, 36.38it/s]

batch 580 loss: 0.6559747338294983


Train, Epoch 1 / 20:  38%|███▊      | 594/1563 [00:16<00:26, 36.47it/s]

batch 590 loss: 0.6628825485706329


Train, Epoch 1 / 20:  39%|███▉      | 606/1563 [00:16<00:26, 36.79it/s]

batch 600 loss: 0.6638054132461548


Train, Epoch 1 / 20:  39%|███▉      | 614/1563 [00:16<00:27, 34.82it/s]

batch 610 loss: 0.6394840061664582


Train, Epoch 1 / 20:  40%|████      | 626/1563 [00:17<00:27, 34.04it/s]

batch 620 loss: 0.660628867149353


Train, Epoch 1 / 20:  41%|████      | 634/1563 [00:17<00:27, 33.50it/s]

batch 630 loss: 0.645998153090477


Train, Epoch 1 / 20:  41%|████▏     | 646/1563 [00:17<00:27, 33.73it/s]

batch 640 loss: 0.6119655549526215


Train, Epoch 1 / 20:  42%|████▏     | 654/1563 [00:18<00:26, 33.72it/s]

batch 650 loss: 0.7261533617973328


Train, Epoch 1 / 20:  43%|████▎     | 666/1563 [00:18<00:27, 33.18it/s]

batch 660 loss: 0.6470463573932648


Train, Epoch 1 / 20:  43%|████▎     | 674/1563 [00:18<00:26, 32.99it/s]

batch 670 loss: 0.6572129487991333


Train, Epoch 1 / 20:  44%|████▍     | 686/1563 [00:19<00:24, 35.55it/s]

batch 680 loss: 0.6747961401939392


Train, Epoch 1 / 20:  44%|████▍     | 694/1563 [00:19<00:24, 36.14it/s]

batch 690 loss: 0.6943644225597382


Train, Epoch 1 / 20:  45%|████▌     | 706/1563 [00:19<00:23, 36.41it/s]

batch 700 loss: 0.6387639462947845


Train, Epoch 1 / 20:  46%|████▌     | 714/1563 [00:19<00:23, 36.81it/s]

batch 710 loss: 0.6855954527854919


Train, Epoch 1 / 20:  46%|████▋     | 726/1563 [00:20<00:22, 37.45it/s]

batch 720 loss: 0.6729492664337158


Train, Epoch 1 / 20:  47%|████▋     | 734/1563 [00:20<00:22, 37.21it/s]

batch 730 loss: 0.7068779766559601


Train, Epoch 1 / 20:  48%|████▊     | 746/1563 [00:20<00:22, 36.95it/s]

batch 740 loss: 0.7126932382583618


Train, Epoch 1 / 20:  48%|████▊     | 754/1563 [00:20<00:21, 37.14it/s]

batch 750 loss: 0.6446284890174866


Train, Epoch 1 / 20:  49%|████▉     | 766/1563 [00:21<00:21, 37.28it/s]

batch 760 loss: 0.6456185102462768


Train, Epoch 1 / 20:  50%|████▉     | 774/1563 [00:21<00:20, 37.61it/s]

batch 770 loss: 0.6429074704647064


Train, Epoch 1 / 20:  50%|█████     | 786/1563 [00:21<00:20, 37.01it/s]

batch 780 loss: 0.6418729662895203


Train, Epoch 1 / 20:  51%|█████     | 794/1563 [00:21<00:20, 37.17it/s]

batch 790 loss: 0.6480269014835358


Train, Epoch 1 / 20:  52%|█████▏    | 806/1563 [00:22<00:20, 37.36it/s]

batch 800 loss: 0.7011474668979645


Train, Epoch 1 / 20:  52%|█████▏    | 814/1563 [00:22<00:20, 37.22it/s]

batch 810 loss: 0.6729663491249085


Train, Epoch 1 / 20:  53%|█████▎    | 826/1563 [00:22<00:20, 36.47it/s]

batch 820 loss: 0.6549346268177032


Train, Epoch 1 / 20:  53%|█████▎    | 834/1563 [00:23<00:19, 36.50it/s]

batch 830 loss: 0.6551573753356934


Train, Epoch 1 / 20:  54%|█████▍    | 846/1563 [00:23<00:19, 36.80it/s]

batch 840 loss: 0.6147096455097198


Train, Epoch 1 / 20:  55%|█████▍    | 854/1563 [00:23<00:19, 36.10it/s]

batch 850 loss: 0.6777885973453521


Train, Epoch 1 / 20:  55%|█████▌    | 866/1563 [00:23<00:18, 36.85it/s]

batch 860 loss: 0.6440800845623016


Train, Epoch 1 / 20:  56%|█████▌    | 874/1563 [00:24<00:18, 37.12it/s]

batch 870 loss: 0.6875703811645508


Train, Epoch 1 / 20:  57%|█████▋    | 886/1563 [00:24<00:18, 37.43it/s]

batch 880 loss: 0.6409802317619324


Train, Epoch 1 / 20:  57%|█████▋    | 894/1563 [00:24<00:17, 37.19it/s]

batch 890 loss: 0.6447867214679718


Train, Epoch 1 / 20:  58%|█████▊    | 906/1563 [00:24<00:17, 36.96it/s]

batch 900 loss: 0.6635799407958984


Train, Epoch 1 / 20:  58%|█████▊    | 914/1563 [00:25<00:17, 36.77it/s]

batch 910 loss: 0.660781842470169


Train, Epoch 1 / 20:  59%|█████▉    | 926/1563 [00:25<00:17, 37.10it/s]

batch 920 loss: 0.6379262775182724


Train, Epoch 1 / 20:  60%|█████▉    | 934/1563 [00:25<00:17, 36.91it/s]

batch 930 loss: 0.6774395287036896


Train, Epoch 1 / 20:  61%|██████    | 946/1563 [00:26<00:16, 36.77it/s]

batch 940 loss: 0.6730788707733154


Train, Epoch 1 / 20:  61%|██████    | 954/1563 [00:26<00:16, 36.52it/s]

batch 950 loss: 0.7239395439624786


Train, Epoch 1 / 20:  62%|██████▏   | 966/1563 [00:26<00:16, 36.00it/s]

batch 960 loss: 0.7004378139972687


Train, Epoch 1 / 20:  62%|██████▏   | 974/1563 [00:26<00:16, 36.46it/s]

batch 970 loss: 0.687486881017685


Train, Epoch 1 / 20:  63%|██████▎   | 986/1563 [00:27<00:15, 36.98it/s]

batch 980 loss: 0.6610557258129119


Train, Epoch 1 / 20:  64%|██████▎   | 994/1563 [00:27<00:15, 37.15it/s]

batch 990 loss: 0.6873931050300598


Train, Epoch 1 / 20:  64%|██████▍   | 1006/1563 [00:27<00:14, 37.31it/s]

batch 1000 loss: 0.7033306151628494


Train, Epoch 1 / 20:  65%|██████▍   | 1014/1563 [00:27<00:14, 37.34it/s]

batch 1010 loss: 0.6943825006484985


Train, Epoch 1 / 20:  66%|██████▌   | 1026/1563 [00:28<00:14, 37.01it/s]

batch 1020 loss: 0.6598503828048706


Train, Epoch 1 / 20:  66%|██████▌   | 1034/1563 [00:28<00:14, 36.77it/s]

batch 1030 loss: 0.6538849472999573


Train, Epoch 1 / 20:  67%|██████▋   | 1046/1563 [00:28<00:14, 35.27it/s]

batch 1040 loss: 0.7259750545024872


Train, Epoch 1 / 20:  67%|██████▋   | 1054/1563 [00:29<00:14, 34.22it/s]

batch 1050 loss: 0.6571329534053802


Train, Epoch 1 / 20:  68%|██████▊   | 1066/1563 [00:29<00:14, 33.53it/s]

batch 1060 loss: 0.7009374141693115


Train, Epoch 1 / 20:  69%|██████▊   | 1074/1563 [00:29<00:14, 33.39it/s]

batch 1070 loss: 0.634359884262085


Train, Epoch 1 / 20:  69%|██████▉   | 1086/1563 [00:30<00:14, 32.95it/s]

batch 1080 loss: 0.659554585814476


Train, Epoch 1 / 20:  70%|██████▉   | 1094/1563 [00:30<00:14, 32.97it/s]

batch 1090 loss: 0.6528152287006378


Train, Epoch 1 / 20:  71%|███████   | 1106/1563 [00:30<00:13, 32.88it/s]

batch 1100 loss: 0.6689007878303528


Train, Epoch 1 / 20:  71%|███████▏  | 1114/1563 [00:30<00:13, 34.38it/s]

batch 1110 loss: 0.6364502370357513


Train, Epoch 1 / 20:  72%|███████▏  | 1126/1563 [00:31<00:12, 36.21it/s]

batch 1120 loss: 0.5955018818378448


Train, Epoch 1 / 20:  73%|███████▎  | 1134/1563 [00:31<00:11, 36.76it/s]

batch 1130 loss: 0.6506943166255951


Train, Epoch 1 / 20:  73%|███████▎  | 1146/1563 [00:31<00:11, 36.70it/s]

batch 1140 loss: 0.6651442646980286


Train, Epoch 1 / 20:  74%|███████▍  | 1154/1563 [00:31<00:11, 36.48it/s]

batch 1150 loss: 0.6649473190307618


Train, Epoch 1 / 20:  75%|███████▍  | 1166/1563 [00:32<00:10, 36.71it/s]

batch 1160 loss: 0.6354784548282624


Train, Epoch 1 / 20:  75%|███████▌  | 1174/1563 [00:32<00:10, 36.81it/s]

batch 1170 loss: 0.6747315168380738


Train, Epoch 1 / 20:  76%|███████▌  | 1186/1563 [00:32<00:10, 36.38it/s]

batch 1180 loss: 0.6355646431446076


Train, Epoch 1 / 20:  76%|███████▋  | 1194/1563 [00:33<00:09, 37.14it/s]

batch 1190 loss: 0.6508319318294525


Train, Epoch 1 / 20:  77%|███████▋  | 1206/1563 [00:33<00:09, 37.39it/s]

batch 1200 loss: 0.6300474345684052


Train, Epoch 1 / 20:  78%|███████▊  | 1214/1563 [00:33<00:09, 37.39it/s]

batch 1210 loss: 0.6028156459331513


Train, Epoch 1 / 20:  78%|███████▊  | 1226/1563 [00:33<00:09, 36.11it/s]

batch 1220 loss: 0.6659299373626709


Train, Epoch 1 / 20:  79%|███████▉  | 1234/1563 [00:34<00:09, 36.53it/s]

batch 1230 loss: 0.6730619728565216


Train, Epoch 1 / 20:  80%|███████▉  | 1246/1563 [00:34<00:08, 36.81it/s]

batch 1240 loss: 0.6739335536956788


Train, Epoch 1 / 20:  80%|████████  | 1254/1563 [00:34<00:08, 35.97it/s]

batch 1250 loss: 0.6546675384044647


Train, Epoch 1 / 20:  81%|████████  | 1266/1563 [00:34<00:08, 36.19it/s]

batch 1260 loss: 0.6848665773868561


Train, Epoch 1 / 20:  82%|████████▏ | 1274/1563 [00:35<00:07, 36.79it/s]

batch 1270 loss: 0.625486183166504


Train, Epoch 1 / 20:  82%|████████▏ | 1286/1563 [00:35<00:07, 36.64it/s]

batch 1280 loss: 0.6497228801250458


Train, Epoch 1 / 20:  83%|████████▎ | 1294/1563 [00:35<00:07, 36.75it/s]

batch 1290 loss: 0.6303292751312256


Train, Epoch 1 / 20:  84%|████████▎ | 1306/1563 [00:36<00:06, 36.80it/s]

batch 1300 loss: 0.6871956288814545


Train, Epoch 1 / 20:  84%|████████▍ | 1314/1563 [00:36<00:06, 37.14it/s]

batch 1310 loss: 0.7358184337615967


Train, Epoch 1 / 20:  85%|████████▍ | 1326/1563 [00:36<00:06, 37.08it/s]

batch 1320 loss: 0.6135870754718781


Train, Epoch 1 / 20:  85%|████████▌ | 1334/1563 [00:36<00:06, 37.19it/s]

batch 1330 loss: 0.6519901871681213


Train, Epoch 1 / 20:  86%|████████▌ | 1346/1563 [00:37<00:05, 37.26it/s]

batch 1340 loss: 0.6818138480186462


Train, Epoch 1 / 20:  87%|████████▋ | 1354/1563 [00:37<00:05, 36.83it/s]

batch 1350 loss: 0.6629418551921844


Train, Epoch 1 / 20:  87%|████████▋ | 1366/1563 [00:37<00:05, 36.10it/s]

batch 1360 loss: 0.7211771726608276


Train, Epoch 1 / 20:  88%|████████▊ | 1374/1563 [00:37<00:05, 36.09it/s]

batch 1370 loss: 0.6837151169776916


Train, Epoch 1 / 20:  89%|████████▊ | 1386/1563 [00:38<00:04, 36.83it/s]

batch 1380 loss: 0.6455259501934052


Train, Epoch 1 / 20:  89%|████████▉ | 1394/1563 [00:38<00:04, 36.73it/s]

batch 1390 loss: 0.6250023186206818


Train, Epoch 1 / 20:  90%|████████▉ | 1406/1563 [00:38<00:04, 36.38it/s]

batch 1400 loss: 0.6277689933776855


Train, Epoch 1 / 20:  90%|█████████ | 1414/1563 [00:39<00:04, 36.45it/s]

batch 1410 loss: 0.6461368918418884


Train, Epoch 1 / 20:  91%|█████████ | 1426/1563 [00:39<00:03, 36.45it/s]

batch 1420 loss: 0.7425316095352172


Train, Epoch 1 / 20:  92%|█████████▏| 1434/1563 [00:39<00:03, 36.51it/s]

batch 1430 loss: 0.6422381222248077


Train, Epoch 1 / 20:  93%|█████████▎| 1446/1563 [00:39<00:03, 36.55it/s]

batch 1440 loss: 0.6287747740745544


Train, Epoch 1 / 20:  93%|█████████▎| 1454/1563 [00:40<00:03, 35.99it/s]

batch 1450 loss: 0.6891044318675995


Train, Epoch 1 / 20:  94%|█████████▍| 1466/1563 [00:40<00:02, 36.32it/s]

batch 1460 loss: 0.7195915341377258


Train, Epoch 1 / 20:  94%|█████████▍| 1474/1563 [00:40<00:02, 36.09it/s]

batch 1470 loss: 0.6113548278808594


Train, Epoch 1 / 20:  95%|█████████▌| 1486/1563 [00:41<00:02, 32.99it/s]

batch 1480 loss: 0.6185984790325165


Train, Epoch 1 / 20:  96%|█████████▌| 1494/1563 [00:41<00:02, 32.79it/s]

batch 1490 loss: 0.6720716118812561


Train, Epoch 1 / 20:  96%|█████████▋| 1506/1563 [00:41<00:01, 33.70it/s]

batch 1500 loss: 0.6674747407436371


Train, Epoch 1 / 20:  97%|█████████▋| 1514/1563 [00:41<00:01, 34.00it/s]

batch 1510 loss: 0.5987180650234223


Train, Epoch 1 / 20:  98%|█████████▊| 1526/1563 [00:42<00:01, 33.12it/s]

batch 1520 loss: 0.7077252328395843


Train, Epoch 1 / 20:  98%|█████████▊| 1534/1563 [00:42<00:00, 32.67it/s]

batch 1530 loss: 0.7026720643043518


Train, Epoch 1 / 20:  99%|█████████▉| 1546/1563 [00:42<00:00, 32.73it/s]

batch 1540 loss: 0.6756873428821564


Train, Epoch 1 / 20:  99%|█████████▉| 1554/1563 [00:43<00:00, 33.30it/s]

batch 1550 loss: 0.6472423136234283


Train, Epoch 1 / 20: 100%|██████████| 1563/1563 [00:43<00:00, 36.04it/s]


batch 1560 loss: 0.6517514079809189


Test, Epoch 1 / 20: 100%|██████████| 1563/1563 [00:20<00:00, 76.15it/s]


Epoch 1, loss: 0.6336822968101501, accuracy: 0.64192


Train, Epoch 2 / 20:   1%|          | 16/1563 [00:00<00:42, 36.56it/s]

batch 10 loss: 0.6271848261356354


Train, Epoch 2 / 20:   2%|▏         | 24/1563 [00:00<00:42, 36.60it/s]

batch 20 loss: 0.636971628665924


Train, Epoch 2 / 20:   2%|▏         | 36/1563 [00:00<00:42, 35.58it/s]

batch 30 loss: 0.606209796667099


Train, Epoch 2 / 20:   3%|▎         | 44/1563 [00:01<00:44, 33.87it/s]

batch 40 loss: 0.6196892142295838


Train, Epoch 2 / 20:   4%|▎         | 56/1563 [00:01<00:44, 33.65it/s]

batch 50 loss: 0.6148012340068817


Train, Epoch 2 / 20:   4%|▍         | 64/1563 [00:01<00:45, 32.65it/s]

batch 60 loss: 0.6396029233932495


Train, Epoch 2 / 20:   5%|▍         | 76/1563 [00:02<00:45, 32.64it/s]

batch 70 loss: 0.6055969715118408


Train, Epoch 2 / 20:   5%|▌         | 84/1563 [00:02<00:45, 32.46it/s]

batch 80 loss: 0.6451368927955627


Train, Epoch 2 / 20:   6%|▌         | 96/1563 [00:02<00:46, 31.70it/s]

batch 90 loss: 0.6295189976692199


Train, Epoch 2 / 20:   7%|▋         | 104/1563 [00:03<00:45, 32.01it/s]

batch 100 loss: 0.6461378186941147


Train, Epoch 2 / 20:   7%|▋         | 116/1563 [00:03<00:41, 34.74it/s]

batch 110 loss: 0.6402128100395202


Train, Epoch 2 / 20:   8%|▊         | 124/1563 [00:03<00:41, 35.04it/s]

batch 120 loss: 0.6535213351249695


Train, Epoch 2 / 20:   9%|▊         | 136/1563 [00:03<00:40, 35.49it/s]

batch 130 loss: 0.6586893796920776


Train, Epoch 2 / 20:   9%|▉         | 144/1563 [00:04<00:39, 35.75it/s]

batch 140 loss: 0.6706075072288513


Train, Epoch 2 / 20:  10%|▉         | 156/1563 [00:04<00:39, 35.97it/s]

batch 150 loss: 0.6746218860149383


Train, Epoch 2 / 20:  10%|█         | 164/1563 [00:04<00:39, 35.71it/s]

batch 160 loss: 0.66772820353508


Train, Epoch 2 / 20:  11%|█▏        | 176/1563 [00:05<00:38, 36.48it/s]

batch 170 loss: 0.6340098917484284


Train, Epoch 2 / 20:  12%|█▏        | 184/1563 [00:05<00:37, 36.39it/s]

batch 180 loss: 0.6368857085704803


Train, Epoch 2 / 20:  13%|█▎        | 196/1563 [00:05<00:37, 36.63it/s]

batch 190 loss: 0.5974403530359268


Train, Epoch 2 / 20:  13%|█▎        | 204/1563 [00:05<00:37, 36.50it/s]

batch 200 loss: 0.6185163140296936


Train, Epoch 2 / 20:  14%|█▍        | 216/1563 [00:06<00:36, 36.48it/s]

batch 210 loss: 0.6256642580032349


Train, Epoch 2 / 20:  14%|█▍        | 224/1563 [00:06<00:36, 36.59it/s]

batch 220 loss: 0.6338073134422302


Train, Epoch 2 / 20:  15%|█▌        | 236/1563 [00:06<00:36, 35.93it/s]

batch 230 loss: 0.598995777964592


Train, Epoch 2 / 20:  16%|█▌        | 244/1563 [00:06<00:37, 35.54it/s]

batch 240 loss: 0.6553486466407776


Train, Epoch 2 / 20:  16%|█▋        | 256/1563 [00:07<00:36, 35.70it/s]

batch 250 loss: 0.6371177345514297


Train, Epoch 2 / 20:  17%|█▋        | 264/1563 [00:07<00:36, 35.61it/s]

batch 260 loss: 0.6231101155281067


Train, Epoch 2 / 20:  18%|█▊        | 276/1563 [00:07<00:36, 35.31it/s]

batch 270 loss: 0.6239610731601715


Train, Epoch 2 / 20:  18%|█▊        | 284/1563 [00:08<00:36, 35.26it/s]

batch 280 loss: 0.6027538388967514


Train, Epoch 2 / 20:  19%|█▉        | 296/1563 [00:08<00:35, 35.76it/s]

batch 290 loss: 0.6104309976100921


Train, Epoch 2 / 20:  19%|█▉        | 304/1563 [00:08<00:35, 35.71it/s]

batch 300 loss: 0.6738574922084808


Train, Epoch 2 / 20:  20%|██        | 316/1563 [00:09<00:35, 35.56it/s]

batch 310 loss: 0.6388796955347061


Train, Epoch 2 / 20:  21%|██        | 324/1563 [00:09<00:34, 36.00it/s]

batch 320 loss: 0.6168038636445999


Train, Epoch 2 / 20:  21%|██▏       | 336/1563 [00:09<00:33, 36.71it/s]

batch 330 loss: 0.6761977255344391


Train, Epoch 2 / 20:  22%|██▏       | 344/1563 [00:09<00:33, 36.72it/s]

batch 340 loss: 0.6816536724567414


Train, Epoch 2 / 20:  23%|██▎       | 356/1563 [00:10<00:32, 36.80it/s]

batch 350 loss: 0.6067950546741485


Train, Epoch 2 / 20:  23%|██▎       | 364/1563 [00:10<00:32, 36.71it/s]

batch 360 loss: 0.6960727095603942


Train, Epoch 2 / 20:  24%|██▍       | 376/1563 [00:10<00:32, 36.52it/s]

batch 370 loss: 0.6339294254779816


Train, Epoch 2 / 20:  25%|██▍       | 384/1563 [00:10<00:32, 36.64it/s]

batch 380 loss: 0.6242762863636017


Train, Epoch 2 / 20:  25%|██▌       | 396/1563 [00:11<00:32, 36.04it/s]

batch 390 loss: 0.6345470607280731


Train, Epoch 2 / 20:  26%|██▌       | 404/1563 [00:11<00:32, 35.88it/s]

batch 400 loss: 0.6369942545890808


Train, Epoch 2 / 20:  27%|██▋       | 416/1563 [00:11<00:31, 35.89it/s]

batch 410 loss: 0.654414489865303


Train, Epoch 2 / 20:  27%|██▋       | 424/1563 [00:11<00:31, 35.61it/s]

batch 420 loss: 0.676044550538063


Train, Epoch 2 / 20:  28%|██▊       | 436/1563 [00:12<00:31, 35.38it/s]

batch 430 loss: 0.6026568859815598


Train, Epoch 2 / 20:  28%|██▊       | 444/1563 [00:12<00:30, 36.11it/s]

batch 440 loss: 0.6332485646009445


Train, Epoch 2 / 20:  29%|██▉       | 456/1563 [00:12<00:30, 36.30it/s]

batch 450 loss: 0.6232675611972809


Train, Epoch 2 / 20:  30%|██▉       | 464/1563 [00:13<00:31, 35.10it/s]

batch 460 loss: 0.6017514258623123


Train, Epoch 2 / 20:  30%|███       | 476/1563 [00:13<00:32, 33.73it/s]

batch 470 loss: 0.6239404320716858


Train, Epoch 2 / 20:  31%|███       | 484/1563 [00:13<00:32, 33.41it/s]

batch 480 loss: 0.622676157951355


Train, Epoch 2 / 20:  32%|███▏      | 496/1563 [00:14<00:33, 32.32it/s]

batch 490 loss: 0.6481781840324402


Train, Epoch 2 / 20:  32%|███▏      | 504/1563 [00:14<00:32, 32.65it/s]

batch 500 loss: 0.6059484601020813


Train, Epoch 2 / 20:  33%|███▎      | 516/1563 [00:14<00:32, 32.66it/s]

batch 510 loss: 0.6206366032361984


Train, Epoch 2 / 20:  34%|███▎      | 524/1563 [00:14<00:32, 32.14it/s]

batch 520 loss: 0.6416920065879822


Train, Epoch 2 / 20:  34%|███▍      | 536/1563 [00:15<00:30, 34.13it/s]

batch 530 loss: 0.6420937180519104


Train, Epoch 2 / 20:  35%|███▍      | 544/1563 [00:15<00:28, 35.57it/s]

batch 540 loss: 0.6188931584358215


Train, Epoch 2 / 20:  36%|███▌      | 556/1563 [00:15<00:28, 35.94it/s]

batch 550 loss: 0.6528117746114731


Train, Epoch 2 / 20:  36%|███▌      | 564/1563 [00:16<00:27, 35.82it/s]

batch 560 loss: 0.6056734085083008


Train, Epoch 2 / 20:  37%|███▋      | 576/1563 [00:16<00:27, 36.25it/s]

batch 570 loss: 0.6080363154411316


Train, Epoch 2 / 20:  37%|███▋      | 584/1563 [00:16<00:26, 36.54it/s]

batch 580 loss: 0.6094421595335007


Train, Epoch 2 / 20:  38%|███▊      | 596/1563 [00:16<00:26, 36.90it/s]

batch 590 loss: 0.6088404685258866


Train, Epoch 2 / 20:  39%|███▊      | 604/1563 [00:17<00:26, 36.54it/s]

batch 600 loss: 0.6650693237781524


Train, Epoch 2 / 20:  39%|███▉      | 616/1563 [00:17<00:25, 36.60it/s]

batch 610 loss: 0.6212649554014206


Train, Epoch 2 / 20:  40%|███▉      | 624/1563 [00:17<00:25, 36.88it/s]

batch 620 loss: 0.6186129748821259


Train, Epoch 2 / 20:  41%|████      | 636/1563 [00:18<00:25, 36.66it/s]

batch 630 loss: 0.5970323026180268


Train, Epoch 2 / 20:  41%|████      | 644/1563 [00:18<00:25, 36.51it/s]

batch 640 loss: 0.6416390240192413


Train, Epoch 2 / 20:  42%|████▏     | 656/1563 [00:18<00:24, 36.73it/s]

batch 650 loss: 0.6354155957698822


Train, Epoch 2 / 20:  42%|████▏     | 664/1563 [00:18<00:24, 36.92it/s]

batch 660 loss: 0.6530123591423035


Train, Epoch 2 / 20:  43%|████▎     | 676/1563 [00:19<00:23, 37.01it/s]

batch 670 loss: 0.5834189206361771


Train, Epoch 2 / 20:  44%|████▍     | 684/1563 [00:19<00:23, 37.11it/s]

batch 680 loss: 0.6349027305841446


Train, Epoch 2 / 20:  45%|████▍     | 696/1563 [00:19<00:23, 36.76it/s]

batch 690 loss: 0.6148915797472


Train, Epoch 2 / 20:  45%|████▌     | 704/1563 [00:19<00:23, 36.02it/s]

batch 700 loss: 0.6279996156692504


Train, Epoch 2 / 20:  46%|████▌     | 716/1563 [00:20<00:23, 36.80it/s]

batch 710 loss: 0.6242704957723617


Train, Epoch 2 / 20:  46%|████▋     | 724/1563 [00:20<00:22, 36.70it/s]

batch 720 loss: 0.6450224399566651


Train, Epoch 2 / 20:  47%|████▋     | 736/1563 [00:20<00:22, 36.90it/s]

batch 730 loss: 0.6351423978805542


Train, Epoch 2 / 20:  48%|████▊     | 744/1563 [00:20<00:22, 36.94it/s]

batch 740 loss: 0.6300949990749359


Train, Epoch 2 / 20:  48%|████▊     | 756/1563 [00:21<00:22, 36.48it/s]

batch 750 loss: 0.5428481608629226


Train, Epoch 2 / 20:  49%|████▉     | 764/1563 [00:21<00:21, 36.87it/s]

batch 760 loss: 0.5933311849832534


Train, Epoch 2 / 20:  50%|████▉     | 776/1563 [00:21<00:21, 37.43it/s]

batch 770 loss: 0.6147862702608109


Train, Epoch 2 / 20:  50%|█████     | 784/1563 [00:22<00:20, 37.16it/s]

batch 780 loss: 0.647946959733963


Train, Epoch 2 / 20:  51%|█████     | 796/1563 [00:22<00:21, 36.13it/s]

batch 790 loss: 0.6518547296524048


Train, Epoch 2 / 20:  51%|█████▏    | 804/1563 [00:22<00:20, 36.23it/s]

batch 800 loss: 0.6243829190731048


Train, Epoch 2 / 20:  52%|█████▏    | 816/1563 [00:22<00:20, 36.40it/s]

batch 810 loss: 0.5899995595216752


Train, Epoch 2 / 20:  53%|█████▎    | 824/1563 [00:23<00:20, 36.34it/s]

batch 820 loss: 0.6224984407424927


Train, Epoch 2 / 20:  53%|█████▎    | 836/1563 [00:23<00:19, 36.62it/s]

batch 830 loss: 0.6208590924739837


Train, Epoch 2 / 20:  54%|█████▍    | 844/1563 [00:23<00:19, 36.61it/s]

batch 840 loss: 0.5910519629716873


Train, Epoch 2 / 20:  55%|█████▍    | 856/1563 [00:24<00:19, 36.72it/s]

batch 850 loss: 0.5768365025520324


Train, Epoch 2 / 20:  55%|█████▌    | 864/1563 [00:24<00:19, 36.32it/s]

batch 860 loss: 0.5817905277013778


Train, Epoch 2 / 20:  56%|█████▌    | 876/1563 [00:24<00:19, 35.98it/s]

batch 870 loss: 0.613585478067398


Train, Epoch 2 / 20:  57%|█████▋    | 884/1563 [00:24<00:18, 36.29it/s]

batch 880 loss: 0.5891208171844482


Train, Epoch 2 / 20:  57%|█████▋    | 896/1563 [00:25<00:18, 35.42it/s]

batch 890 loss: 0.5691988527774811


Train, Epoch 2 / 20:  58%|█████▊    | 904/1563 [00:25<00:19, 33.45it/s]

batch 900 loss: 0.5638591259717941


Train, Epoch 2 / 20:  59%|█████▊    | 916/1563 [00:25<00:19, 33.46it/s]

batch 910 loss: 0.6011294931173324


Train, Epoch 2 / 20:  59%|█████▉    | 924/1563 [00:25<00:19, 32.74it/s]

batch 920 loss: 0.5659625232219696


Train, Epoch 2 / 20:  60%|█████▉    | 936/1563 [00:26<00:19, 32.89it/s]

batch 930 loss: 0.6770088851451874


Train, Epoch 2 / 20:  60%|██████    | 944/1563 [00:26<00:19, 31.95it/s]

batch 940 loss: 0.5462495058774948


Train, Epoch 2 / 20:  61%|██████    | 956/1563 [00:26<00:18, 32.15it/s]

batch 950 loss: 0.632432046532631


Train, Epoch 2 / 20:  62%|██████▏   | 964/1563 [00:27<00:18, 33.01it/s]

batch 960 loss: 0.630549019575119


Train, Epoch 2 / 20:  62%|██████▏   | 976/1563 [00:27<00:16, 35.05it/s]

batch 970 loss: 0.572083705663681


Train, Epoch 2 / 20:  63%|██████▎   | 984/1563 [00:27<00:16, 35.63it/s]

batch 980 loss: 0.5892010450363159


Train, Epoch 2 / 20:  64%|██████▎   | 996/1563 [00:28<00:15, 36.11it/s]

batch 990 loss: 0.5922538548707962


Train, Epoch 2 / 20:  64%|██████▍   | 1004/1563 [00:28<00:15, 36.56it/s]

batch 1000 loss: 0.6358540207147598


Train, Epoch 2 / 20:  65%|██████▌   | 1016/1563 [00:28<00:14, 36.87it/s]

batch 1010 loss: 0.5894974708557129


Train, Epoch 2 / 20:  66%|██████▌   | 1024/1563 [00:28<00:14, 36.76it/s]

batch 1020 loss: 0.6023842453956604


Train, Epoch 2 / 20:  66%|██████▋   | 1036/1563 [00:29<00:14, 37.13it/s]

batch 1030 loss: 0.6434711635112762


Train, Epoch 2 / 20:  67%|██████▋   | 1044/1563 [00:29<00:14, 36.60it/s]

batch 1040 loss: 0.6207197636365891


Train, Epoch 2 / 20:  68%|██████▊   | 1056/1563 [00:29<00:13, 36.90it/s]

batch 1050 loss: 0.6734367340803147


Train, Epoch 2 / 20:  68%|██████▊   | 1064/1563 [00:29<00:13, 36.86it/s]

batch 1060 loss: 0.6094916760921478


Train, Epoch 2 / 20:  69%|██████▉   | 1076/1563 [00:30<00:13, 36.62it/s]

batch 1070 loss: 0.6192271560430527


Train, Epoch 2 / 20:  69%|██████▉   | 1084/1563 [00:30<00:13, 36.77it/s]

batch 1080 loss: 0.5668957591056824


Train, Epoch 2 / 20:  70%|███████   | 1096/1563 [00:30<00:12, 36.93it/s]

batch 1090 loss: 0.6207317113876343


Train, Epoch 2 / 20:  71%|███████   | 1104/1563 [00:31<00:12, 37.15it/s]

batch 1100 loss: 0.5757826089859008


Train, Epoch 2 / 20:  71%|███████▏  | 1116/1563 [00:31<00:12, 36.79it/s]

batch 1110 loss: 0.5719537675380707


Train, Epoch 2 / 20:  72%|███████▏  | 1124/1563 [00:31<00:11, 37.01it/s]

batch 1120 loss: 0.6097920089960098


Train, Epoch 2 / 20:  73%|███████▎  | 1136/1563 [00:31<00:11, 37.22it/s]

batch 1130 loss: 0.6022000312805176


Train, Epoch 2 / 20:  73%|███████▎  | 1144/1563 [00:32<00:11, 36.76it/s]

batch 1140 loss: 0.6152333676815033


Train, Epoch 2 / 20:  74%|███████▍  | 1156/1563 [00:32<00:11, 35.91it/s]

batch 1150 loss: 0.5955264955759049


Train, Epoch 2 / 20:  74%|███████▍  | 1164/1563 [00:32<00:10, 36.28it/s]

batch 1160 loss: 0.5998523861169816


Train, Epoch 2 / 20:  75%|███████▌  | 1176/1563 [00:33<00:10, 36.84it/s]

batch 1170 loss: 0.6068512529134751


Train, Epoch 2 / 20:  76%|███████▌  | 1184/1563 [00:33<00:10, 36.62it/s]

batch 1180 loss: 0.4993733584880829


Train, Epoch 2 / 20:  77%|███████▋  | 1196/1563 [00:33<00:09, 36.96it/s]

batch 1190 loss: 0.6206141978502273


Train, Epoch 2 / 20:  77%|███████▋  | 1204/1563 [00:33<00:09, 36.62it/s]

batch 1200 loss: 0.6122417330741883


Train, Epoch 2 / 20:  78%|███████▊  | 1216/1563 [00:34<00:09, 36.89it/s]

batch 1210 loss: 0.6270175278186798


Train, Epoch 2 / 20:  78%|███████▊  | 1224/1563 [00:34<00:09, 36.88it/s]

batch 1220 loss: 0.6397241950035095


Train, Epoch 2 / 20:  79%|███████▉  | 1236/1563 [00:34<00:08, 37.15it/s]

batch 1230 loss: 0.6044781804084778


Train, Epoch 2 / 20:  80%|███████▉  | 1244/1563 [00:34<00:08, 36.68it/s]

batch 1240 loss: 0.6222001373767853


Train, Epoch 2 / 20:  80%|████████  | 1256/1563 [00:35<00:08, 37.15it/s]

batch 1250 loss: 0.5496773779392242


Train, Epoch 2 / 20:  81%|████████  | 1264/1563 [00:35<00:08, 36.84it/s]

batch 1260 loss: 0.5556348353624344


Train, Epoch 2 / 20:  82%|████████▏ | 1276/1563 [00:35<00:07, 35.94it/s]

batch 1270 loss: 0.597113773226738


Train, Epoch 2 / 20:  82%|████████▏ | 1284/1563 [00:35<00:07, 36.15it/s]

batch 1280 loss: 0.5890358299016952


Train, Epoch 2 / 20:  83%|████████▎ | 1296/1563 [00:36<00:07, 36.17it/s]

batch 1290 loss: 0.5821835577487946


Train, Epoch 2 / 20:  83%|████████▎ | 1304/1563 [00:36<00:07, 35.54it/s]

batch 1300 loss: 0.5022913783788681


Train, Epoch 2 / 20:  84%|████████▍ | 1316/1563 [00:36<00:06, 36.31it/s]

batch 1310 loss: 0.5703900068998337


Train, Epoch 2 / 20:  85%|████████▍ | 1324/1563 [00:37<00:06, 36.05it/s]

batch 1320 loss: 0.6044491976499557


Train, Epoch 2 / 20:  85%|████████▌ | 1336/1563 [00:37<00:06, 33.82it/s]

batch 1330 loss: 0.6665011763572692


Train, Epoch 2 / 20:  86%|████████▌ | 1344/1563 [00:37<00:06, 33.50it/s]

batch 1340 loss: 0.5648221850395203


Train, Epoch 2 / 20:  87%|████████▋ | 1356/1563 [00:38<00:06, 32.31it/s]

batch 1350 loss: 0.5702930569648743


Train, Epoch 2 / 20:  87%|████████▋ | 1364/1563 [00:38<00:06, 32.47it/s]

batch 1360 loss: 0.6104366034269333


Train, Epoch 2 / 20:  88%|████████▊ | 1376/1563 [00:38<00:05, 32.88it/s]

batch 1370 loss: 0.538163161277771


Train, Epoch 2 / 20:  89%|████████▊ | 1384/1563 [00:38<00:05, 32.23it/s]

batch 1380 loss: 0.5804875284433365


Train, Epoch 2 / 20:  89%|████████▉ | 1396/1563 [00:39<00:05, 33.35it/s]

batch 1390 loss: 0.5794252127408981


Train, Epoch 2 / 20:  90%|████████▉ | 1404/1563 [00:39<00:04, 34.71it/s]

batch 1400 loss: 0.5627176105976105


Train, Epoch 2 / 20:  91%|█████████ | 1416/1563 [00:39<00:04, 35.86it/s]

batch 1410 loss: 0.5707271277904511


Train, Epoch 2 / 20:  91%|█████████ | 1424/1563 [00:40<00:03, 36.35it/s]

batch 1420 loss: 0.6387507975101471


Train, Epoch 2 / 20:  92%|█████████▏| 1436/1563 [00:40<00:03, 36.44it/s]

batch 1430 loss: 0.6691709190607071


Train, Epoch 2 / 20:  92%|█████████▏| 1444/1563 [00:40<00:03, 36.81it/s]

batch 1440 loss: 0.6052499443292618


Train, Epoch 2 / 20:  93%|█████████▎| 1456/1563 [00:40<00:02, 37.04it/s]

batch 1450 loss: 0.50611053109169


Train, Epoch 2 / 20:  94%|█████████▎| 1464/1563 [00:41<00:02, 36.98it/s]

batch 1460 loss: 0.5868113309144973


Train, Epoch 2 / 20:  94%|█████████▍| 1476/1563 [00:41<00:02, 36.89it/s]

batch 1470 loss: 0.6241772651672364


Train, Epoch 2 / 20:  95%|█████████▍| 1484/1563 [00:41<00:02, 36.61it/s]

batch 1480 loss: 0.5724125593900681


Train, Epoch 2 / 20:  96%|█████████▌| 1496/1563 [00:41<00:01, 36.66it/s]

batch 1490 loss: 0.5855401307344437


Train, Epoch 2 / 20:  96%|█████████▌| 1504/1563 [00:42<00:01, 36.68it/s]

batch 1500 loss: 0.533375808596611


Train, Epoch 2 / 20:  97%|█████████▋| 1516/1563 [00:42<00:01, 36.84it/s]

batch 1510 loss: 0.5267640560865402


Train, Epoch 2 / 20:  98%|█████████▊| 1524/1563 [00:42<00:01, 36.74it/s]

batch 1520 loss: 0.5798900336027145


Train, Epoch 2 / 20:  98%|█████████▊| 1536/1563 [00:43<00:00, 36.69it/s]

batch 1530 loss: 0.5092031270265579


Train, Epoch 2 / 20:  99%|█████████▉| 1544/1563 [00:43<00:00, 36.83it/s]

batch 1540 loss: 0.6890248239040375


Train, Epoch 2 / 20: 100%|█████████▉| 1556/1563 [00:43<00:00, 37.01it/s]

batch 1550 loss: 0.6196154773235321


Train, Epoch 2 / 20: 100%|██████████| 1563/1563 [00:43<00:00, 35.68it/s]


batch 1560 loss: 0.5598818391561509


Test, Epoch 2 / 20: 100%|██████████| 1563/1563 [00:21<00:00, 74.24it/s]


Epoch 2, loss: 0.5668339790678024, accuracy: 0.70084


Train, Epoch 3 / 20:   1%|          | 16/1563 [00:00<00:43, 35.35it/s]

batch 10 loss: 0.5632823914289474


Train, Epoch 3 / 20:   2%|▏         | 24/1563 [00:00<00:43, 35.13it/s]

batch 20 loss: 0.587927657365799


Train, Epoch 3 / 20:   2%|▏         | 36/1563 [00:01<00:43, 35.48it/s]

batch 30 loss: 0.5659753262996674


Train, Epoch 3 / 20:   3%|▎         | 44/1563 [00:01<00:42, 35.59it/s]

batch 40 loss: 0.5419257372617722


Train, Epoch 3 / 20:   4%|▎         | 56/1563 [00:01<00:42, 35.54it/s]

batch 50 loss: 0.5404174536466598


Train, Epoch 3 / 20:   4%|▍         | 64/1563 [00:01<00:42, 34.99it/s]

batch 60 loss: 0.5996215403079986


Train, Epoch 3 / 20:   5%|▍         | 76/1563 [00:02<00:42, 34.96it/s]

batch 70 loss: 0.6336149871349335


Train, Epoch 3 / 20:   5%|▌         | 84/1563 [00:02<00:41, 35.54it/s]

batch 80 loss: 0.530036774277687


Train, Epoch 3 / 20:   6%|▌         | 96/1563 [00:02<00:41, 35.37it/s]

batch 90 loss: 0.5625097185373307


Train, Epoch 3 / 20:   7%|▋         | 104/1563 [00:02<00:40, 35.83it/s]

batch 100 loss: 0.5481215000152588


Train, Epoch 3 / 20:   7%|▋         | 116/1563 [00:03<00:41, 35.29it/s]

batch 110 loss: 0.5782339006662369


Train, Epoch 3 / 20:   8%|▊         | 124/1563 [00:03<00:40, 35.51it/s]

batch 120 loss: 0.512179383635521


Train, Epoch 3 / 20:   9%|▊         | 136/1563 [00:03<00:39, 35.92it/s]

batch 130 loss: 0.5582993149757385


Train, Epoch 3 / 20:   9%|▉         | 144/1563 [00:04<00:39, 35.65it/s]

batch 140 loss: 0.5163566887378692


Train, Epoch 3 / 20:  10%|▉         | 156/1563 [00:04<00:38, 36.20it/s]

batch 150 loss: 0.5102536499500274


Train, Epoch 3 / 20:  10%|█         | 164/1563 [00:04<00:38, 35.99it/s]

batch 160 loss: 0.5804806917905807


Train, Epoch 3 / 20:  11%|█▏        | 176/1563 [00:04<00:38, 35.81it/s]

batch 170 loss: 0.5404229968786239


Train, Epoch 3 / 20:  12%|█▏        | 184/1563 [00:05<00:38, 35.86it/s]

batch 180 loss: 0.5348264276981354


Train, Epoch 3 / 20:  13%|█▎        | 196/1563 [00:05<00:37, 36.00it/s]

batch 190 loss: 0.5750449568033218


Train, Epoch 3 / 20:  13%|█▎        | 204/1563 [00:05<00:38, 35.30it/s]

batch 200 loss: 0.5867150872945786


Train, Epoch 3 / 20:  14%|█▍        | 216/1563 [00:06<00:37, 35.96it/s]

batch 210 loss: 0.5314448654651642


Train, Epoch 3 / 20:  14%|█▍        | 224/1563 [00:06<00:37, 36.00it/s]

batch 220 loss: 0.558706772327423


Train, Epoch 3 / 20:  15%|█▌        | 236/1563 [00:06<00:36, 36.46it/s]

batch 230 loss: 0.5467156618833542


Train, Epoch 3 / 20:  16%|█▌        | 244/1563 [00:06<00:36, 36.16it/s]

batch 240 loss: 0.5930550992488861


Train, Epoch 3 / 20:  16%|█▋        | 256/1563 [00:07<00:35, 36.33it/s]

batch 250 loss: 0.5593098253011703


Train, Epoch 3 / 20:  17%|█▋        | 264/1563 [00:07<00:35, 36.58it/s]

batch 260 loss: 0.6135081976652146


Train, Epoch 3 / 20:  18%|█▊        | 276/1563 [00:07<00:35, 36.21it/s]

batch 270 loss: 0.5174893647432327


Train, Epoch 3 / 20:  18%|█▊        | 284/1563 [00:07<00:35, 35.59it/s]

batch 280 loss: 0.4930004715919495


Train, Epoch 3 / 20:  19%|█▉        | 296/1563 [00:08<00:35, 36.04it/s]

batch 290 loss: 0.5608818382024765


Train, Epoch 3 / 20:  19%|█▉        | 304/1563 [00:08<00:34, 36.38it/s]

batch 300 loss: 0.5616070955991745


Train, Epoch 3 / 20:  20%|██        | 316/1563 [00:08<00:37, 33.63it/s]

batch 310 loss: 0.5671830654144288


Train, Epoch 3 / 20:  21%|██        | 324/1563 [00:09<00:36, 33.89it/s]

batch 320 loss: 0.5467169433832169


Train, Epoch 3 / 20:  21%|██▏       | 336/1563 [00:09<00:35, 34.24it/s]

batch 330 loss: 0.5421434968709946


Train, Epoch 3 / 20:  22%|██▏       | 344/1563 [00:09<00:36, 33.58it/s]

batch 340 loss: 0.5384325474500656


Train, Epoch 3 / 20:  23%|██▎       | 356/1563 [00:10<00:36, 33.11it/s]

batch 350 loss: 0.5687884658575058


Train, Epoch 3 / 20:  23%|██▎       | 364/1563 [00:10<00:36, 32.88it/s]

batch 360 loss: 0.575029319524765


Train, Epoch 3 / 20:  24%|██▍       | 376/1563 [00:10<00:36, 32.38it/s]

batch 370 loss: 0.5222044378519058


Train, Epoch 3 / 20:  25%|██▍       | 384/1563 [00:10<00:34, 33.84it/s]

batch 380 loss: 0.5617320358753204


Train, Epoch 3 / 20:  25%|██▌       | 396/1563 [00:11<00:32, 35.52it/s]

batch 390 loss: 0.5251877725124359


Train, Epoch 3 / 20:  26%|██▌       | 404/1563 [00:11<00:32, 36.01it/s]

batch 400 loss: 0.5121213287115097


Train, Epoch 3 / 20:  27%|██▋       | 416/1563 [00:11<00:32, 35.81it/s]

batch 410 loss: 0.5714632570743561


Train, Epoch 3 / 20:  27%|██▋       | 424/1563 [00:12<00:31, 35.73it/s]

batch 420 loss: 0.5900665372610092


Train, Epoch 3 / 20:  28%|██▊       | 436/1563 [00:12<00:31, 35.99it/s]

batch 430 loss: 0.5808881640434265


Train, Epoch 3 / 20:  28%|██▊       | 444/1563 [00:12<00:30, 36.29it/s]

batch 440 loss: 0.5765612840652465


Train, Epoch 3 / 20:  29%|██▉       | 456/1563 [00:12<00:30, 36.64it/s]

batch 450 loss: 0.6289682924747467


Train, Epoch 3 / 20:  30%|██▉       | 464/1563 [00:13<00:29, 36.64it/s]

batch 460 loss: 0.5630977064371109


Train, Epoch 3 / 20:  30%|███       | 476/1563 [00:13<00:29, 36.83it/s]

batch 470 loss: 0.5541498541831971


Train, Epoch 3 / 20:  31%|███       | 484/1563 [00:13<00:29, 36.63it/s]

batch 480 loss: 0.563787403702736


Train, Epoch 3 / 20:  32%|███▏      | 496/1563 [00:14<00:29, 36.64it/s]

batch 490 loss: 0.549153870344162


Train, Epoch 3 / 20:  32%|███▏      | 504/1563 [00:14<00:29, 36.49it/s]

batch 500 loss: 0.4996975213289261


Train, Epoch 3 / 20:  33%|███▎      | 516/1563 [00:14<00:29, 35.97it/s]

batch 510 loss: 0.5234148442745209


Train, Epoch 3 / 20:  34%|███▎      | 524/1563 [00:14<00:29, 35.77it/s]

batch 520 loss: 0.5549486547708511


Train, Epoch 3 / 20:  34%|███▍      | 536/1563 [00:15<00:29, 35.19it/s]

batch 530 loss: 0.5670996278524398


Train, Epoch 3 / 20:  35%|███▍      | 544/1563 [00:15<00:28, 35.65it/s]

batch 540 loss: 0.6015529602766037


Train, Epoch 3 / 20:  36%|███▌      | 556/1563 [00:15<00:27, 36.19it/s]

batch 550 loss: 0.49965350329875946


Train, Epoch 3 / 20:  36%|███▌      | 564/1563 [00:15<00:27, 36.55it/s]

batch 560 loss: 0.5353044390678405


Train, Epoch 3 / 20:  37%|███▋      | 576/1563 [00:16<00:27, 35.38it/s]

batch 570 loss: 0.4769325256347656


Train, Epoch 3 / 20:  37%|███▋      | 584/1563 [00:16<00:27, 35.91it/s]

batch 580 loss: 0.5551953345537186


Train, Epoch 3 / 20:  38%|███▊      | 596/1563 [00:16<00:26, 36.07it/s]

batch 590 loss: 0.5526608794927597


Train, Epoch 3 / 20:  39%|███▊      | 604/1563 [00:17<00:26, 36.35it/s]

batch 600 loss: 0.5268593162298203


Train, Epoch 3 / 20:  39%|███▉      | 616/1563 [00:17<00:26, 35.69it/s]

batch 610 loss: 0.6252463102340698


Train, Epoch 3 / 20:  40%|███▉      | 624/1563 [00:17<00:26, 35.50it/s]

batch 620 loss: 0.5561811566352844


Train, Epoch 3 / 20:  41%|████      | 636/1563 [00:17<00:26, 35.22it/s]

batch 630 loss: 0.5473146289587021


Train, Epoch 3 / 20:  41%|████      | 644/1563 [00:18<00:25, 35.63it/s]

batch 640 loss: 0.5361000239849091


Train, Epoch 3 / 20:  42%|████▏     | 656/1563 [00:18<00:25, 35.95it/s]

batch 650 loss: 0.5737464398145675


Train, Epoch 3 / 20:  42%|████▏     | 664/1563 [00:18<00:24, 36.05it/s]

batch 660 loss: 0.53104667365551


Train, Epoch 3 / 20:  43%|████▎     | 676/1563 [00:19<00:24, 35.88it/s]

batch 670 loss: 0.5367096334695816


Train, Epoch 3 / 20:  44%|████▍     | 684/1563 [00:19<00:24, 35.29it/s]

batch 680 loss: 0.49921135902404784


Train, Epoch 3 / 20:  45%|████▍     | 696/1563 [00:19<00:24, 35.46it/s]

batch 690 loss: 0.5166571229696274


Train, Epoch 3 / 20:  45%|████▌     | 704/1563 [00:19<00:23, 35.91it/s]

batch 700 loss: 0.5580120831727982


Train, Epoch 3 / 20:  46%|████▌     | 716/1563 [00:20<00:23, 36.08it/s]

batch 710 loss: 0.5500278383493423


Train, Epoch 3 / 20:  46%|████▋     | 724/1563 [00:20<00:23, 36.08it/s]

batch 720 loss: 0.6230413109064102


Train, Epoch 3 / 20:  47%|████▋     | 736/1563 [00:20<00:23, 35.93it/s]

batch 730 loss: 0.5597321361303329


Train, Epoch 3 / 20:  48%|████▊     | 744/1563 [00:20<00:23, 34.72it/s]

batch 740 loss: 0.513798001408577


Train, Epoch 3 / 20:  48%|████▊     | 756/1563 [00:21<00:23, 33.96it/s]

batch 750 loss: 0.6049502402544021


Train, Epoch 3 / 20:  49%|████▉     | 764/1563 [00:21<00:23, 34.35it/s]

batch 760 loss: 0.48628620207309725


Train, Epoch 3 / 20:  50%|████▉     | 776/1563 [00:21<00:22, 34.32it/s]

batch 770 loss: 0.5215316623449325


Train, Epoch 3 / 20:  50%|█████     | 784/1563 [00:22<00:23, 33.72it/s]

batch 780 loss: 0.542555621266365


Train, Epoch 3 / 20:  51%|█████     | 796/1563 [00:22<00:23, 32.95it/s]

batch 790 loss: 0.5450442850589752


Train, Epoch 3 / 20:  51%|█████▏    | 804/1563 [00:22<00:22, 33.19it/s]

batch 800 loss: 0.5485602915287018


Train, Epoch 3 / 20:  52%|█████▏    | 816/1563 [00:23<00:21, 34.49it/s]

batch 810 loss: 0.5175459861755372


Train, Epoch 3 / 20:  53%|█████▎    | 824/1563 [00:23<00:21, 35.08it/s]

batch 820 loss: 0.5614874005317688


Train, Epoch 3 / 20:  53%|█████▎    | 836/1563 [00:23<00:20, 36.17it/s]

batch 830 loss: 0.5798921078443527


Train, Epoch 3 / 20:  54%|█████▍    | 844/1563 [00:23<00:20, 35.81it/s]

batch 840 loss: 0.5011611640453338


Train, Epoch 3 / 20:  55%|█████▍    | 856/1563 [00:24<00:19, 35.93it/s]

batch 850 loss: 0.49784114956855774


Train, Epoch 3 / 20:  55%|█████▌    | 864/1563 [00:24<00:19, 35.56it/s]

batch 860 loss: 0.4587388694286346


Train, Epoch 3 / 20:  56%|█████▌    | 876/1563 [00:24<00:19, 35.42it/s]

batch 870 loss: 0.5158325612545014


Train, Epoch 3 / 20:  57%|█████▋    | 884/1563 [00:24<00:19, 35.71it/s]

batch 880 loss: 0.5510516315698624


Train, Epoch 3 / 20:  57%|█████▋    | 896/1563 [00:25<00:18, 36.23it/s]

batch 890 loss: 0.5033622562885285


Train, Epoch 3 / 20:  58%|█████▊    | 904/1563 [00:25<00:18, 36.32it/s]

batch 900 loss: 0.4999219238758087


Train, Epoch 3 / 20:  59%|█████▊    | 916/1563 [00:25<00:17, 36.57it/s]

batch 910 loss: 0.5241882681846619


Train, Epoch 3 / 20:  59%|█████▉    | 924/1563 [00:26<00:17, 36.80it/s]

batch 920 loss: 0.5578957051038742


Train, Epoch 3 / 20:  60%|█████▉    | 936/1563 [00:26<00:17, 36.59it/s]

batch 930 loss: 0.6226192086935043


Train, Epoch 3 / 20:  60%|██████    | 944/1563 [00:26<00:16, 36.61it/s]

batch 940 loss: 0.5619463145732879


Train, Epoch 3 / 20:  61%|██████    | 956/1563 [00:26<00:16, 36.54it/s]

batch 950 loss: 0.4934003472328186


Train, Epoch 3 / 20:  62%|██████▏   | 964/1563 [00:27<00:16, 36.52it/s]

batch 960 loss: 0.5679546922445298


Train, Epoch 3 / 20:  62%|██████▏   | 976/1563 [00:27<00:16, 36.37it/s]

batch 970 loss: 0.5277221858501434


Train, Epoch 3 / 20:  63%|██████▎   | 984/1563 [00:27<00:15, 36.36it/s]

batch 980 loss: 0.5872358918190003


Train, Epoch 3 / 20:  64%|██████▎   | 996/1563 [00:28<00:15, 36.21it/s]

batch 990 loss: 0.5385113269090652


Train, Epoch 3 / 20:  64%|██████▍   | 1004/1563 [00:28<00:15, 36.32it/s]

batch 1000 loss: 0.4934824824333191


Train, Epoch 3 / 20:  65%|██████▌   | 1016/1563 [00:28<00:15, 35.92it/s]

batch 1010 loss: 0.5302934676408768


Train, Epoch 3 / 20:  66%|██████▌   | 1024/1563 [00:28<00:14, 36.13it/s]

batch 1020 loss: 0.5303774386644363


Train, Epoch 3 / 20:  66%|██████▋   | 1036/1563 [00:29<00:14, 36.14it/s]

batch 1030 loss: 0.5569625973701477


Train, Epoch 3 / 20:  67%|██████▋   | 1044/1563 [00:29<00:14, 35.85it/s]

batch 1040 loss: 0.5039101719856263


Train, Epoch 3 / 20:  68%|██████▊   | 1056/1563 [00:29<00:14, 35.88it/s]

batch 1050 loss: 0.5265418350696563


Train, Epoch 3 / 20:  68%|██████▊   | 1064/1563 [00:29<00:13, 35.96it/s]

batch 1060 loss: 0.5398173928260803


Train, Epoch 3 / 20:  69%|██████▉   | 1076/1563 [00:30<00:13, 35.44it/s]

batch 1070 loss: 0.5265325903892517


Train, Epoch 3 / 20:  69%|██████▉   | 1084/1563 [00:30<00:13, 35.58it/s]

batch 1080 loss: 0.5211444854736328


Train, Epoch 3 / 20:  70%|███████   | 1096/1563 [00:30<00:12, 36.38it/s]

batch 1090 loss: 0.5165123611688613


Train, Epoch 3 / 20:  71%|███████   | 1104/1563 [00:31<00:12, 36.29it/s]

batch 1100 loss: 0.5445104241371155


Train, Epoch 3 / 20:  71%|███████▏  | 1116/1563 [00:31<00:12, 36.61it/s]

batch 1110 loss: 0.599040886759758


Train, Epoch 3 / 20:  72%|███████▏  | 1124/1563 [00:31<00:12, 35.99it/s]

batch 1120 loss: 0.5397619396448136


Train, Epoch 3 / 20:  73%|███████▎  | 1136/1563 [00:31<00:11, 35.99it/s]

batch 1130 loss: 0.533537071943283


Train, Epoch 3 / 20:  73%|███████▎  | 1144/1563 [00:32<00:11, 36.14it/s]

batch 1140 loss: 0.48127255737781527


Train, Epoch 3 / 20:  74%|███████▍  | 1156/1563 [00:32<00:11, 36.31it/s]

batch 1150 loss: 0.47876068353652956


Train, Epoch 3 / 20:  74%|███████▍  | 1164/1563 [00:32<00:10, 36.60it/s]

batch 1160 loss: 0.5354716539382934


Train, Epoch 3 / 20:  75%|███████▌  | 1176/1563 [00:33<00:11, 34.13it/s]

batch 1170 loss: 0.6441799938678742


Train, Epoch 3 / 20:  76%|███████▌  | 1184/1563 [00:33<00:11, 33.71it/s]

batch 1180 loss: 0.5862886816263199


Train, Epoch 3 / 20:  77%|███████▋  | 1196/1563 [00:33<00:11, 32.46it/s]

batch 1190 loss: 0.5511905252933502


Train, Epoch 3 / 20:  77%|███████▋  | 1204/1563 [00:33<00:10, 32.76it/s]

batch 1200 loss: 0.5124018579721451


Train, Epoch 3 / 20:  78%|███████▊  | 1216/1563 [00:34<00:10, 32.76it/s]

batch 1210 loss: 0.610664838552475


Train, Epoch 3 / 20:  78%|███████▊  | 1224/1563 [00:34<00:10, 32.94it/s]

batch 1220 loss: 0.5875932812690735


Train, Epoch 3 / 20:  79%|███████▉  | 1236/1563 [00:34<00:09, 32.75it/s]

batch 1230 loss: 0.5450990110635757


Train, Epoch 3 / 20:  80%|███████▉  | 1244/1563 [00:35<00:09, 34.08it/s]

batch 1240 loss: 0.5605185896158218


Train, Epoch 3 / 20:  80%|████████  | 1256/1563 [00:35<00:08, 35.68it/s]

batch 1250 loss: 0.5506832033395768


Train, Epoch 3 / 20:  81%|████████  | 1264/1563 [00:35<00:08, 35.68it/s]

batch 1260 loss: 0.49030342102050783


Train, Epoch 3 / 20:  82%|████████▏ | 1276/1563 [00:36<00:07, 36.43it/s]

batch 1270 loss: 0.5497186958789826


Train, Epoch 3 / 20:  82%|████████▏ | 1284/1563 [00:36<00:07, 36.26it/s]

batch 1280 loss: 0.49593801498413087


Train, Epoch 3 / 20:  83%|████████▎ | 1296/1563 [00:36<00:07, 36.42it/s]

batch 1290 loss: 0.5398935198783874


Train, Epoch 3 / 20:  83%|████████▎ | 1304/1563 [00:36<00:07, 36.35it/s]

batch 1300 loss: 0.5195113152265549


Train, Epoch 3 / 20:  84%|████████▍ | 1316/1563 [00:37<00:06, 36.40it/s]

batch 1310 loss: 0.5931191772222519


Train, Epoch 3 / 20:  85%|████████▍ | 1324/1563 [00:37<00:06, 36.42it/s]

batch 1320 loss: 0.5669297099113464


Train, Epoch 3 / 20:  85%|████████▌ | 1336/1563 [00:37<00:06, 36.63it/s]

batch 1330 loss: 0.584836283326149


Train, Epoch 3 / 20:  86%|████████▌ | 1344/1563 [00:37<00:06, 36.33it/s]

batch 1340 loss: 0.5408933669328689


Train, Epoch 3 / 20:  87%|████████▋ | 1356/1563 [00:38<00:05, 36.78it/s]

batch 1350 loss: 0.5177632123231888


Train, Epoch 3 / 20:  87%|████████▋ | 1364/1563 [00:38<00:05, 36.76it/s]

batch 1360 loss: 0.6030507922172547


Train, Epoch 3 / 20:  88%|████████▊ | 1376/1563 [00:38<00:05, 36.76it/s]

batch 1370 loss: 0.5974538058042527


Train, Epoch 3 / 20:  89%|████████▊ | 1384/1563 [00:38<00:04, 36.58it/s]

batch 1380 loss: 0.5757475733757019


Train, Epoch 3 / 20:  89%|████████▉ | 1396/1563 [00:39<00:04, 36.55it/s]

batch 1390 loss: 0.530045285820961


Train, Epoch 3 / 20:  90%|████████▉ | 1404/1563 [00:39<00:04, 36.43it/s]

batch 1400 loss: 0.5899303913116455


Train, Epoch 3 / 20:  91%|█████████ | 1416/1563 [00:39<00:04, 35.58it/s]

batch 1410 loss: 0.5281086772680282


Train, Epoch 3 / 20:  91%|█████████ | 1424/1563 [00:40<00:03, 36.12it/s]

batch 1420 loss: 0.4978815734386444


Train, Epoch 3 / 20:  92%|█████████▏| 1436/1563 [00:40<00:03, 36.09it/s]

batch 1430 loss: 0.5231970995664597


Train, Epoch 3 / 20:  92%|█████████▏| 1444/1563 [00:40<00:03, 36.19it/s]

batch 1440 loss: 0.5674686521291733


Train, Epoch 3 / 20:  93%|█████████▎| 1456/1563 [00:40<00:02, 36.33it/s]

batch 1450 loss: 0.5666150361299515


Train, Epoch 3 / 20:  94%|█████████▎| 1464/1563 [00:41<00:02, 35.86it/s]

batch 1460 loss: 0.5183138519525528


Train, Epoch 3 / 20:  94%|█████████▍| 1476/1563 [00:41<00:02, 35.80it/s]

batch 1470 loss: 0.4782890915870667


Train, Epoch 3 / 20:  95%|█████████▍| 1484/1563 [00:41<00:02, 35.75it/s]

batch 1480 loss: 0.5251066356897354


Train, Epoch 3 / 20:  96%|█████████▌| 1496/1563 [00:42<00:01, 36.30it/s]

batch 1490 loss: 0.5634129077196122


Train, Epoch 3 / 20:  96%|█████████▌| 1504/1563 [00:42<00:01, 36.20it/s]

batch 1500 loss: 0.5298899650573731


Train, Epoch 3 / 20:  97%|█████████▋| 1516/1563 [00:42<00:01, 36.53it/s]

batch 1510 loss: 0.5369325786828995


Train, Epoch 3 / 20:  98%|█████████▊| 1524/1563 [00:42<00:01, 36.71it/s]

batch 1520 loss: 0.5327377796173096


Train, Epoch 3 / 20:  98%|█████████▊| 1536/1563 [00:43<00:00, 36.32it/s]

batch 1530 loss: 0.5232072383165359


Train, Epoch 3 / 20:  99%|█████████▉| 1544/1563 [00:43<00:00, 36.55it/s]

batch 1540 loss: 0.6183259844779968


Train, Epoch 3 / 20: 100%|█████████▉| 1556/1563 [00:43<00:00, 36.33it/s]

batch 1550 loss: 0.5314951598644256


Train, Epoch 3 / 20: 100%|██████████| 1563/1563 [00:43<00:00, 35.59it/s]


batch 1560 loss: 0.5563854455947876


Test, Epoch 3 / 20: 100%|██████████| 1563/1563 [00:21<00:00, 74.22it/s]


Epoch 3, loss: 0.5369047594308853, accuracy: 0.72796


Train, Epoch 4 / 20:   1%|          | 16/1563 [00:00<00:43, 35.41it/s]

batch 10 loss: 0.4994078278541565


Train, Epoch 4 / 20:   2%|▏         | 24/1563 [00:00<00:43, 35.65it/s]

batch 20 loss: 0.6434419542551041


Train, Epoch 4 / 20:   2%|▏         | 36/1563 [00:01<00:43, 35.26it/s]

batch 30 loss: 0.5112132340669632


Train, Epoch 4 / 20:   3%|▎         | 44/1563 [00:01<00:42, 35.67it/s]

batch 40 loss: 0.5692048400640488


Train, Epoch 4 / 20:   4%|▎         | 56/1563 [00:01<00:42, 35.80it/s]

batch 50 loss: 0.5643846392631531


Train, Epoch 4 / 20:   4%|▍         | 64/1563 [00:01<00:41, 36.17it/s]

batch 60 loss: 0.49329245984554293


Train, Epoch 4 / 20:   5%|▍         | 76/1563 [00:02<00:40, 36.61it/s]

batch 70 loss: 0.48019647151231765


Train, Epoch 4 / 20:   5%|▌         | 84/1563 [00:02<00:40, 36.45it/s]

batch 80 loss: 0.5123453885316849


Train, Epoch 4 / 20:   6%|▌         | 96/1563 [00:02<00:41, 35.58it/s]

batch 90 loss: 0.529398825764656


Train, Epoch 4 / 20:   7%|▋         | 104/1563 [00:02<00:40, 35.94it/s]

batch 100 loss: 0.5388110041618347


Train, Epoch 4 / 20:   7%|▋         | 116/1563 [00:03<00:40, 35.91it/s]

batch 110 loss: 0.5698142021894455


Train, Epoch 4 / 20:   8%|▊         | 124/1563 [00:03<00:39, 36.44it/s]

batch 120 loss: 0.5189756363630295


Train, Epoch 4 / 20:   9%|▊         | 136/1563 [00:03<00:39, 36.57it/s]

batch 130 loss: 0.5512435913085938


Train, Epoch 4 / 20:   9%|▉         | 144/1563 [00:04<00:39, 36.20it/s]

batch 140 loss: 0.48624767959117887


Train, Epoch 4 / 20:  10%|▉         | 156/1563 [00:04<00:39, 35.60it/s]

batch 150 loss: 0.5156253010034562


Train, Epoch 4 / 20:  10%|█         | 164/1563 [00:04<00:40, 34.57it/s]

batch 160 loss: 0.5222564756870269


Train, Epoch 4 / 20:  11%|█▏        | 176/1563 [00:04<00:41, 33.27it/s]

batch 170 loss: 0.4272927284240723


Train, Epoch 4 / 20:  12%|█▏        | 184/1563 [00:05<00:41, 33.09it/s]

batch 180 loss: 0.501341950893402


Train, Epoch 4 / 20:  13%|█▎        | 196/1563 [00:05<00:41, 33.12it/s]

batch 190 loss: 0.5751136660575866


Train, Epoch 4 / 20:  13%|█▎        | 204/1563 [00:05<00:42, 32.18it/s]

batch 200 loss: 0.5460467129945755


Train, Epoch 4 / 20:  14%|█▍        | 216/1563 [00:06<00:41, 32.50it/s]

batch 210 loss: 0.508243665099144


Train, Epoch 4 / 20:  14%|█▍        | 224/1563 [00:06<00:41, 32.50it/s]

batch 220 loss: 0.5280725300312042


Train, Epoch 4 / 20:  15%|█▌        | 236/1563 [00:06<00:38, 34.31it/s]

batch 230 loss: 0.4347725987434387


Train, Epoch 4 / 20:  16%|█▌        | 244/1563 [00:06<00:37, 35.18it/s]

batch 240 loss: 0.5196432173252106


Train, Epoch 4 / 20:  16%|█▋        | 256/1563 [00:07<00:36, 35.84it/s]

batch 250 loss: 0.5451947540044785


Train, Epoch 4 / 20:  17%|█▋        | 264/1563 [00:07<00:36, 36.06it/s]

batch 260 loss: 0.5072270542383194


Train, Epoch 4 / 20:  18%|█▊        | 276/1563 [00:07<00:35, 36.08it/s]

batch 270 loss: 0.5003728538751602


Train, Epoch 4 / 20:  18%|█▊        | 284/1563 [00:08<00:35, 36.12it/s]

batch 280 loss: 0.5534784168004989


Train, Epoch 4 / 20:  19%|█▉        | 296/1563 [00:08<00:34, 36.24it/s]

batch 290 loss: 0.5103324145078659


Train, Epoch 4 / 20:  19%|█▉        | 304/1563 [00:08<00:34, 36.43it/s]

batch 300 loss: 0.5690458625555038


Train, Epoch 4 / 20:  20%|██        | 316/1563 [00:08<00:34, 36.32it/s]

batch 310 loss: 0.5887678533792495


Train, Epoch 4 / 20:  21%|██        | 324/1563 [00:09<00:34, 35.66it/s]

batch 320 loss: 0.5259397894144058


Train, Epoch 4 / 20:  21%|██▏       | 336/1563 [00:09<00:35, 35.05it/s]

batch 330 loss: 0.5248598903417587


Train, Epoch 4 / 20:  22%|██▏       | 344/1563 [00:09<00:34, 35.11it/s]

batch 340 loss: 0.5664726465940475


Train, Epoch 4 / 20:  23%|██▎       | 356/1563 [00:10<00:33, 35.51it/s]

batch 350 loss: 0.4599637299776077


Train, Epoch 4 / 20:  23%|██▎       | 364/1563 [00:10<00:33, 35.86it/s]

batch 360 loss: 0.48270185589790343


Train, Epoch 4 / 20:  24%|██▍       | 376/1563 [00:10<00:32, 36.56it/s]

batch 370 loss: 0.468876039981842


Train, Epoch 4 / 20:  25%|██▍       | 384/1563 [00:10<00:32, 36.33it/s]

batch 380 loss: 0.49758165776729585


Train, Epoch 4 / 20:  25%|██▌       | 396/1563 [00:11<00:32, 36.29it/s]

batch 390 loss: 0.47855025827884673


Train, Epoch 4 / 20:  26%|██▌       | 404/1563 [00:11<00:31, 36.35it/s]

batch 400 loss: 0.5010908484458924


Train, Epoch 4 / 20:  27%|██▋       | 416/1563 [00:11<00:31, 36.17it/s]

batch 410 loss: 0.45913874804973603


Train, Epoch 4 / 20:  27%|██▋       | 424/1563 [00:11<00:32, 35.29it/s]

batch 420 loss: 0.5376039117574691


Train, Epoch 4 / 20:  28%|██▊       | 436/1563 [00:12<00:31, 35.33it/s]

batch 430 loss: 0.49572872519493105


Train, Epoch 4 / 20:  28%|██▊       | 444/1563 [00:12<00:31, 35.73it/s]

batch 440 loss: 0.5907208770513535


Train, Epoch 4 / 20:  29%|██▉       | 456/1563 [00:12<00:31, 35.55it/s]

batch 450 loss: 0.5748094469308853


Train, Epoch 4 / 20:  30%|██▉       | 464/1563 [00:13<00:30, 35.46it/s]

batch 460 loss: 0.4948898911476135


Train, Epoch 4 / 20:  30%|███       | 476/1563 [00:13<00:30, 36.22it/s]

batch 470 loss: 0.5019407480955124


Train, Epoch 4 / 20:  31%|███       | 484/1563 [00:13<00:29, 36.75it/s]

batch 480 loss: 0.46212278604507445


Train, Epoch 4 / 20:  32%|███▏      | 496/1563 [00:13<00:29, 36.64it/s]

batch 490 loss: 0.48992650508880614


Train, Epoch 4 / 20:  32%|███▏      | 504/1563 [00:14<00:28, 36.69it/s]

batch 500 loss: 0.4576370418071747


Train, Epoch 4 / 20:  33%|███▎      | 516/1563 [00:14<00:28, 36.41it/s]

batch 510 loss: 0.5354379773139953


Train, Epoch 4 / 20:  34%|███▎      | 524/1563 [00:14<00:29, 35.64it/s]

batch 520 loss: 0.4756795674562454


Train, Epoch 4 / 20:  34%|███▍      | 536/1563 [00:15<00:28, 35.80it/s]

batch 530 loss: 0.46031094491481783


Train, Epoch 4 / 20:  35%|███▍      | 544/1563 [00:15<00:28, 36.04it/s]

batch 540 loss: 0.5259965240955353


Train, Epoch 4 / 20:  36%|███▌      | 556/1563 [00:15<00:27, 36.51it/s]

batch 550 loss: 0.5608070492744446


Train, Epoch 4 / 20:  36%|███▌      | 564/1563 [00:15<00:27, 36.54it/s]

batch 560 loss: 0.43688787519931793


Train, Epoch 4 / 20:  37%|███▋      | 576/1563 [00:16<00:27, 36.31it/s]

batch 570 loss: 0.5694393932819366


Train, Epoch 4 / 20:  37%|███▋      | 584/1563 [00:16<00:26, 36.56it/s]

batch 580 loss: 0.49103338420391085


Train, Epoch 4 / 20:  38%|███▊      | 596/1563 [00:16<00:28, 34.23it/s]

batch 590 loss: 0.5381140321493149


Train, Epoch 4 / 20:  39%|███▊      | 604/1563 [00:17<00:29, 32.79it/s]

batch 600 loss: 0.5175382316112518


Train, Epoch 4 / 20:  39%|███▉      | 616/1563 [00:17<00:28, 33.34it/s]

batch 610 loss: 0.4965175032615662


Train, Epoch 4 / 20:  40%|███▉      | 624/1563 [00:17<00:28, 32.94it/s]

batch 620 loss: 0.5561768293380738


Train, Epoch 4 / 20:  41%|████      | 636/1563 [00:18<00:29, 31.90it/s]

batch 630 loss: 0.6205794245004654


Train, Epoch 4 / 20:  41%|████      | 644/1563 [00:18<00:29, 31.50it/s]

batch 640 loss: 0.5309587091207504


Train, Epoch 4 / 20:  42%|████▏     | 656/1563 [00:18<00:28, 32.14it/s]

batch 650 loss: 0.5650813281536102


Train, Epoch 4 / 20:  42%|████▏     | 664/1563 [00:18<00:26, 33.50it/s]

batch 660 loss: 0.44689599275588987


Train, Epoch 4 / 20:  43%|████▎     | 676/1563 [00:19<00:25, 34.76it/s]

batch 670 loss: 0.5278555065393448


Train, Epoch 4 / 20:  44%|████▍     | 684/1563 [00:19<00:24, 35.50it/s]

batch 680 loss: 0.4700905054807663


Train, Epoch 4 / 20:  45%|████▍     | 696/1563 [00:19<00:24, 35.96it/s]

batch 690 loss: 0.5845996141433716


Train, Epoch 4 / 20:  45%|████▌     | 704/1563 [00:20<00:23, 36.19it/s]

batch 700 loss: 0.4685788989067078


Train, Epoch 4 / 20:  46%|████▌     | 716/1563 [00:20<00:23, 36.38it/s]

batch 710 loss: 0.464014932513237


Train, Epoch 4 / 20:  46%|████▋     | 724/1563 [00:20<00:22, 36.64it/s]

batch 720 loss: 0.5523836702108383


Train, Epoch 4 / 20:  47%|████▋     | 736/1563 [00:20<00:22, 35.99it/s]

batch 730 loss: 0.5191008538007736


Train, Epoch 4 / 20:  48%|████▊     | 744/1563 [00:21<00:22, 35.81it/s]

batch 740 loss: 0.527998760342598


Train, Epoch 4 / 20:  48%|████▊     | 756/1563 [00:21<00:22, 36.20it/s]

batch 750 loss: 0.5233257561922073


Train, Epoch 4 / 20:  49%|████▉     | 764/1563 [00:21<00:22, 35.96it/s]

batch 760 loss: 0.4344611406326294


Train, Epoch 4 / 20:  50%|████▉     | 776/1563 [00:21<00:21, 36.49it/s]

batch 770 loss: 0.5605746001005173


Train, Epoch 4 / 20:  50%|█████     | 784/1563 [00:22<00:21, 36.03it/s]

batch 780 loss: 0.46489094793796537


Train, Epoch 4 / 20:  51%|█████     | 796/1563 [00:22<00:21, 36.26it/s]

batch 790 loss: 0.5451731294393539


Train, Epoch 4 / 20:  51%|█████▏    | 804/1563 [00:22<00:20, 36.23it/s]

batch 800 loss: 0.4783195346593857


Train, Epoch 4 / 20:  52%|█████▏    | 816/1563 [00:23<00:20, 36.23it/s]

batch 810 loss: 0.5552724808454513


Train, Epoch 4 / 20:  53%|█████▎    | 824/1563 [00:23<00:20, 36.36it/s]

batch 820 loss: 0.4999107033014297


Train, Epoch 4 / 20:  53%|█████▎    | 836/1563 [00:23<00:19, 36.64it/s]

batch 830 loss: 0.4968337804079056


Train, Epoch 4 / 20:  54%|█████▍    | 844/1563 [00:23<00:19, 36.23it/s]

batch 840 loss: 0.52976995408535


Train, Epoch 4 / 20:  55%|█████▍    | 856/1563 [00:24<00:19, 36.06it/s]

batch 850 loss: 0.5797840714454651


Train, Epoch 4 / 20:  55%|█████▌    | 864/1563 [00:24<00:19, 36.11it/s]

batch 860 loss: 0.47233452200889586


Train, Epoch 4 / 20:  56%|█████▌    | 876/1563 [00:24<00:19, 36.02it/s]

batch 870 loss: 0.5033729523420334


Train, Epoch 4 / 20:  57%|█████▋    | 884/1563 [00:24<00:18, 36.29it/s]

batch 880 loss: 0.49735871851444247


Train, Epoch 4 / 20:  57%|█████▋    | 896/1563 [00:25<00:18, 35.90it/s]

batch 890 loss: 0.5322026491165162


Train, Epoch 4 / 20:  58%|█████▊    | 904/1563 [00:25<00:18, 35.58it/s]

batch 900 loss: 0.4819146156311035


Train, Epoch 4 / 20:  59%|█████▊    | 916/1563 [00:25<00:17, 36.02it/s]

batch 910 loss: 0.5272277176380158


Train, Epoch 4 / 20:  59%|█████▉    | 924/1563 [00:26<00:17, 36.23it/s]

batch 920 loss: 0.49677114486694335


Train, Epoch 4 / 20:  60%|█████▉    | 936/1563 [00:26<00:17, 36.31it/s]

batch 930 loss: 0.5159471899271011


Train, Epoch 4 / 20:  60%|██████    | 944/1563 [00:26<00:16, 36.64it/s]

batch 940 loss: 0.5303036242723465


Train, Epoch 4 / 20:  61%|██████    | 956/1563 [00:26<00:16, 36.52it/s]

batch 950 loss: 0.5478211343288422


Train, Epoch 4 / 20:  62%|██████▏   | 964/1563 [00:27<00:16, 36.16it/s]

batch 960 loss: 0.5030693113803864


Train, Epoch 4 / 20:  62%|██████▏   | 976/1563 [00:27<00:16, 36.46it/s]

batch 970 loss: 0.5133628696203232


Train, Epoch 4 / 20:  63%|██████▎   | 984/1563 [00:27<00:16, 35.90it/s]

batch 980 loss: 0.513684430718422


Train, Epoch 4 / 20:  64%|██████▎   | 996/1563 [00:28<00:15, 36.02it/s]

batch 990 loss: 0.4919076472520828


Train, Epoch 4 / 20:  64%|██████▍   | 1004/1563 [00:28<00:15, 35.29it/s]

batch 1000 loss: 0.442838180065155


Train, Epoch 4 / 20:  65%|██████▌   | 1016/1563 [00:28<00:15, 35.29it/s]

batch 1010 loss: 0.5204283326864243


Train, Epoch 4 / 20:  66%|██████▌   | 1024/1563 [00:28<00:15, 33.72it/s]

batch 1020 loss: 0.4802982658147812


Train, Epoch 4 / 20:  66%|██████▋   | 1036/1563 [00:29<00:15, 33.21it/s]

batch 1030 loss: 0.5513124749064445


Train, Epoch 4 / 20:  67%|██████▋   | 1044/1563 [00:29<00:16, 31.66it/s]

batch 1040 loss: 0.5425298482179641


Train, Epoch 4 / 20:  68%|██████▊   | 1056/1563 [00:29<00:15, 32.80it/s]

batch 1050 loss: 0.5689015775918961


Train, Epoch 4 / 20:  68%|██████▊   | 1064/1563 [00:30<00:15, 32.86it/s]

batch 1060 loss: 0.44616167843341825


Train, Epoch 4 / 20:  69%|██████▉   | 1076/1563 [00:30<00:15, 31.76it/s]

batch 1070 loss: 0.47901883721351624


Train, Epoch 4 / 20:  69%|██████▉   | 1084/1563 [00:30<00:15, 31.79it/s]

batch 1080 loss: 0.5340332567691803


Train, Epoch 4 / 20:  70%|███████   | 1096/1563 [00:31<00:13, 34.33it/s]

batch 1090 loss: 0.48508878201246264


Train, Epoch 4 / 20:  71%|███████   | 1104/1563 [00:31<00:12, 35.31it/s]

batch 1100 loss: 0.5088686674833298


Train, Epoch 4 / 20:  71%|███████▏  | 1116/1563 [00:31<00:12, 36.28it/s]

batch 1110 loss: 0.4544645696878433


Train, Epoch 4 / 20:  72%|███████▏  | 1124/1563 [00:31<00:12, 36.01it/s]

batch 1120 loss: 0.4552408277988434


Train, Epoch 4 / 20:  73%|███████▎  | 1136/1563 [00:32<00:11, 35.72it/s]

batch 1130 loss: 0.5046411842107773


Train, Epoch 4 / 20:  73%|███████▎  | 1144/1563 [00:32<00:11, 35.61it/s]

batch 1140 loss: 0.44610860049724577


Train, Epoch 4 / 20:  74%|███████▍  | 1156/1563 [00:32<00:11, 36.07it/s]

batch 1150 loss: 0.47839086353778837


Train, Epoch 4 / 20:  74%|███████▍  | 1164/1563 [00:32<00:10, 36.30it/s]

batch 1160 loss: 0.5053139925003052


Train, Epoch 4 / 20:  75%|███████▌  | 1176/1563 [00:33<00:10, 36.48it/s]

batch 1170 loss: 0.49562784731388093


Train, Epoch 4 / 20:  76%|███████▌  | 1184/1563 [00:33<00:10, 36.47it/s]

batch 1180 loss: 0.5223385006189346


Train, Epoch 4 / 20:  77%|███████▋  | 1196/1563 [00:33<00:10, 36.59it/s]

batch 1190 loss: 0.5358642905950546


Train, Epoch 4 / 20:  77%|███████▋  | 1204/1563 [00:34<00:09, 36.37it/s]

batch 1200 loss: 0.41661613136529924


Train, Epoch 4 / 20:  78%|███████▊  | 1216/1563 [00:34<00:09, 36.75it/s]

batch 1210 loss: 0.4753888964653015


Train, Epoch 4 / 20:  78%|███████▊  | 1224/1563 [00:34<00:09, 36.44it/s]

batch 1220 loss: 0.5218711018562316


Train, Epoch 4 / 20:  79%|███████▉  | 1236/1563 [00:34<00:08, 36.51it/s]

batch 1230 loss: 0.5124060392379761


Train, Epoch 4 / 20:  80%|███████▉  | 1244/1563 [00:35<00:08, 36.42it/s]

batch 1240 loss: 0.5401692926883698


Train, Epoch 4 / 20:  80%|████████  | 1256/1563 [00:35<00:08, 36.22it/s]

batch 1250 loss: 0.5371033728122712


Train, Epoch 4 / 20:  81%|████████  | 1264/1563 [00:35<00:08, 36.08it/s]

batch 1260 loss: 0.5340334177017212


Train, Epoch 4 / 20:  82%|████████▏ | 1276/1563 [00:36<00:07, 36.07it/s]

batch 1270 loss: 0.42105554342269896


Train, Epoch 4 / 20:  82%|████████▏ | 1284/1563 [00:36<00:07, 36.26it/s]

batch 1280 loss: 0.5588739395141602


Train, Epoch 4 / 20:  83%|████████▎ | 1296/1563 [00:36<00:07, 36.03it/s]

batch 1290 loss: 0.5053678274154663


Train, Epoch 4 / 20:  83%|████████▎ | 1304/1563 [00:36<00:07, 36.30it/s]

batch 1300 loss: 0.4941383957862854


Train, Epoch 4 / 20:  84%|████████▍ | 1316/1563 [00:37<00:06, 36.62it/s]

batch 1310 loss: 0.5965093433856964


Train, Epoch 4 / 20:  85%|████████▍ | 1324/1563 [00:37<00:06, 36.46it/s]

batch 1320 loss: 0.5924808979034424


Train, Epoch 4 / 20:  85%|████████▌ | 1336/1563 [00:37<00:06, 35.73it/s]

batch 1330 loss: 0.4995561957359314


Train, Epoch 4 / 20:  86%|████████▌ | 1344/1563 [00:37<00:06, 35.41it/s]

batch 1340 loss: 0.5065113276243209


Train, Epoch 4 / 20:  87%|████████▋ | 1356/1563 [00:38<00:05, 35.96it/s]

batch 1350 loss: 0.4733114093542099


Train, Epoch 4 / 20:  87%|████████▋ | 1364/1563 [00:38<00:05, 35.77it/s]

batch 1360 loss: 0.4807469516992569


Train, Epoch 4 / 20:  88%|████████▊ | 1376/1563 [00:38<00:05, 35.54it/s]

batch 1370 loss: 0.545086145401001


Train, Epoch 4 / 20:  89%|████████▊ | 1384/1563 [00:39<00:05, 35.18it/s]

batch 1380 loss: 0.5368668824434281


Train, Epoch 4 / 20:  89%|████████▉ | 1396/1563 [00:39<00:04, 35.78it/s]

batch 1390 loss: 0.5489593386650086


Train, Epoch 4 / 20:  90%|████████▉ | 1404/1563 [00:39<00:04, 35.09it/s]

batch 1400 loss: 0.4893146246671677


Train, Epoch 4 / 20:  91%|█████████ | 1416/1563 [00:39<00:04, 34.70it/s]

batch 1410 loss: 0.5170228898525238


Train, Epoch 4 / 20:  91%|█████████ | 1424/1563 [00:40<00:03, 35.28it/s]

batch 1420 loss: 0.49539280533790586


Train, Epoch 4 / 20:  92%|█████████▏| 1436/1563 [00:40<00:03, 35.67it/s]

batch 1430 loss: 0.5246042609214783


Train, Epoch 4 / 20:  92%|█████████▏| 1444/1563 [00:40<00:03, 35.24it/s]

batch 1440 loss: 0.5394777894020081


Train, Epoch 4 / 20:  93%|█████████▎| 1456/1563 [00:41<00:03, 33.55it/s]

batch 1450 loss: 0.5971612930297852


Train, Epoch 4 / 20:  94%|█████████▎| 1464/1563 [00:41<00:03, 32.79it/s]

batch 1460 loss: 0.5333360403776168


Train, Epoch 4 / 20:  94%|█████████▍| 1476/1563 [00:41<00:02, 33.04it/s]

batch 1470 loss: 0.5237987995147705


Train, Epoch 4 / 20:  95%|█████████▍| 1484/1563 [00:42<00:02, 33.10it/s]

batch 1480 loss: 0.5331501305103302


Train, Epoch 4 / 20:  96%|█████████▌| 1496/1563 [00:42<00:02, 32.90it/s]

batch 1490 loss: 0.48220327496528625


Train, Epoch 4 / 20:  96%|█████████▌| 1504/1563 [00:42<00:01, 32.42it/s]

batch 1500 loss: 0.5235623419284821


Train, Epoch 4 / 20:  97%|█████████▋| 1516/1563 [00:43<00:01, 30.58it/s]

batch 1510 loss: 0.4962511986494064


Train, Epoch 4 / 20:  98%|█████████▊| 1524/1563 [00:43<00:01, 32.78it/s]

batch 1520 loss: 0.43516184091567994


Train, Epoch 4 / 20:  98%|█████████▊| 1536/1563 [00:43<00:00, 34.35it/s]

batch 1530 loss: 0.4732425272464752


Train, Epoch 4 / 20:  99%|█████████▉| 1544/1563 [00:43<00:00, 34.53it/s]

batch 1540 loss: 0.5584754317998886


Train, Epoch 4 / 20: 100%|█████████▉| 1556/1563 [00:44<00:00, 35.12it/s]

batch 1550 loss: 0.46324335038661957


Train, Epoch 4 / 20: 100%|██████████| 1563/1563 [00:44<00:00, 35.23it/s]


batch 1560 loss: 0.5025962650775909


Test, Epoch 4 / 20: 100%|██████████| 1563/1563 [00:20<00:00, 75.82it/s]


Epoch 4, loss: 0.5222357081532478, accuracy: 0.73772


Train, Epoch 5 / 20:   1%|          | 16/1563 [00:00<00:45, 33.87it/s]

batch 10 loss: 0.46712797284126284


Train, Epoch 5 / 20:   2%|▏         | 24/1563 [00:00<00:45, 33.71it/s]

batch 20 loss: 0.516248294711113


Train, Epoch 5 / 20:   2%|▏         | 36/1563 [00:01<00:47, 32.37it/s]

batch 30 loss: 0.5530150294303894


Train, Epoch 5 / 20:   3%|▎         | 44/1563 [00:01<00:48, 31.38it/s]

batch 40 loss: 0.5256787121295929


Train, Epoch 5 / 20:   4%|▎         | 56/1563 [00:01<00:47, 31.40it/s]

batch 50 loss: 0.5204282373189926


Train, Epoch 5 / 20:   4%|▍         | 64/1563 [00:01<00:47, 31.80it/s]

batch 60 loss: 0.527277621626854


Train, Epoch 5 / 20:   5%|▍         | 76/1563 [00:02<00:46, 32.18it/s]

batch 70 loss: 0.4804700642824173


Train, Epoch 5 / 20:   5%|▌         | 80/1563 [00:02<00:47, 30.93it/s]

batch 80 loss: 0.4463575094938278


Train, Epoch 5 / 20:   6%|▌         | 95/1563 [00:03<00:50, 29.14it/s]

batch 90 loss: 0.5098332107067108


Train, Epoch 5 / 20:   7%|▋         | 107/1563 [00:03<00:44, 32.87it/s]

batch 100 loss: 0.5904511868953705


Train, Epoch 5 / 20:   7%|▋         | 115/1563 [00:03<00:46, 31.34it/s]

batch 110 loss: 0.5222261011600494


Train, Epoch 5 / 20:   8%|▊         | 122/1563 [00:03<00:50, 28.71it/s]

batch 120 loss: 0.4585348829627037


Train, Epoch 5 / 20:   8%|▊         | 131/1563 [00:04<01:17, 18.48it/s]

batch 130 loss: 0.5005964070558548


Train, Epoch 5 / 20:   9%|▉         | 141/1563 [00:05<01:46, 13.34it/s]

batch 140 loss: 0.5984780699014663


Train, Epoch 5 / 20:  10%|▉         | 154/1563 [00:06<01:08, 20.56it/s]

batch 150 loss: 0.4411861181259155


Train, Epoch 5 / 20:  11%|█         | 166/1563 [00:06<00:47, 29.24it/s]

batch 160 loss: 0.5146209895610809


Train, Epoch 5 / 20:  11%|█         | 174/1563 [00:06<00:43, 32.12it/s]

batch 170 loss: 0.4787428379058838


Train, Epoch 5 / 20:  12%|█▏        | 186/1563 [00:06<00:40, 34.04it/s]

batch 180 loss: 0.47821931838989257


Train, Epoch 5 / 20:  12%|█▏        | 194/1563 [00:07<00:39, 34.88it/s]

batch 190 loss: 0.5423178642988205


Train, Epoch 5 / 20:  13%|█▎        | 206/1563 [00:07<00:37, 36.01it/s]

batch 200 loss: 0.5489064961671829


Train, Epoch 5 / 20:  14%|█▎        | 214/1563 [00:07<00:37, 35.71it/s]

batch 210 loss: 0.4962716907262802


Train, Epoch 5 / 20:  14%|█▍        | 226/1563 [00:08<00:37, 35.76it/s]

batch 220 loss: 0.6030582547187805


Train, Epoch 5 / 20:  15%|█▍        | 234/1563 [00:08<00:37, 35.81it/s]

batch 230 loss: 0.4447723478078842


Train, Epoch 5 / 20:  16%|█▌        | 246/1563 [00:08<00:36, 35.82it/s]

batch 240 loss: 0.45862154960632323


Train, Epoch 5 / 20:  16%|█▋        | 254/1563 [00:08<00:36, 35.77it/s]

batch 250 loss: 0.47507821023464203


Train, Epoch 5 / 20:  17%|█▋        | 266/1563 [00:09<00:35, 36.05it/s]

batch 260 loss: 0.43446812927722933


Train, Epoch 5 / 20:  18%|█▊        | 274/1563 [00:09<00:35, 36.50it/s]

batch 270 loss: 0.6052414625883102


Train, Epoch 5 / 20:  18%|█▊        | 286/1563 [00:09<00:35, 35.93it/s]

batch 280 loss: 0.47770237028598783


Train, Epoch 5 / 20:  19%|█▉        | 294/1563 [00:09<00:35, 35.72it/s]

batch 290 loss: 0.515381607413292


Train, Epoch 5 / 20:  20%|█▉        | 306/1563 [00:10<00:35, 35.51it/s]

batch 300 loss: 0.46044524013996124


Train, Epoch 5 / 20:  20%|██        | 314/1563 [00:10<00:35, 35.63it/s]

batch 310 loss: 0.3781991213560104


Train, Epoch 5 / 20:  21%|██        | 326/1563 [00:10<00:34, 35.73it/s]

batch 320 loss: 0.4386580392718315


Train, Epoch 5 / 20:  21%|██▏       | 334/1563 [00:11<00:34, 35.85it/s]

batch 330 loss: 0.45812323987483977


Train, Epoch 5 / 20:  22%|██▏       | 346/1563 [00:11<00:33, 36.33it/s]

batch 340 loss: 0.4108631521463394


Train, Epoch 5 / 20:  23%|██▎       | 354/1563 [00:11<00:33, 36.59it/s]

batch 350 loss: 0.49239412546157835


Train, Epoch 5 / 20:  23%|██▎       | 366/1563 [00:11<00:33, 35.90it/s]

batch 360 loss: 0.45268000960350036


Train, Epoch 5 / 20:  24%|██▍       | 374/1563 [00:12<00:32, 36.09it/s]

batch 370 loss: 0.544361124932766


Train, Epoch 5 / 20:  25%|██▍       | 386/1563 [00:12<00:32, 35.95it/s]

batch 380 loss: 0.41613690853118895


Train, Epoch 5 / 20:  25%|██▌       | 394/1563 [00:12<00:33, 35.28it/s]

batch 390 loss: 0.5148210942745208


Train, Epoch 5 / 20:  26%|██▌       | 406/1563 [00:13<00:35, 32.88it/s]

batch 400 loss: 0.5188795775175095


Train, Epoch 5 / 20:  26%|██▋       | 414/1563 [00:13<00:35, 32.33it/s]

batch 410 loss: 0.4740517124533653


Train, Epoch 5 / 20:  27%|██▋       | 426/1563 [00:13<00:34, 32.56it/s]

batch 420 loss: 0.49266176819801333


Train, Epoch 5 / 20:  28%|██▊       | 434/1563 [00:14<00:34, 32.95it/s]

batch 430 loss: 0.5213297098875046


Train, Epoch 5 / 20:  29%|██▊       | 446/1563 [00:14<00:34, 32.68it/s]

batch 440 loss: 0.5382419317960739


Train, Epoch 5 / 20:  29%|██▉       | 454/1563 [00:14<00:34, 32.31it/s]

batch 450 loss: 0.4414970278739929


Train, Epoch 5 / 20:  30%|██▉       | 466/1563 [00:14<00:33, 32.57it/s]

batch 460 loss: 0.5432571679353714


Train, Epoch 5 / 20:  30%|███       | 474/1563 [00:15<00:31, 34.06it/s]

batch 470 loss: 0.5171219915151596


Train, Epoch 5 / 20:  31%|███       | 486/1563 [00:15<00:29, 35.95it/s]

batch 480 loss: 0.5234004706144333


Train, Epoch 5 / 20:  32%|███▏      | 494/1563 [00:15<00:29, 35.67it/s]

batch 490 loss: 0.5681058764457703


Train, Epoch 5 / 20:  32%|███▏      | 506/1563 [00:16<00:29, 35.85it/s]

batch 500 loss: 0.5305239528417587


Train, Epoch 5 / 20:  33%|███▎      | 514/1563 [00:16<00:29, 35.84it/s]

batch 510 loss: 0.4823776721954346


Train, Epoch 5 / 20:  34%|███▎      | 526/1563 [00:16<00:28, 36.17it/s]

batch 520 loss: 0.4937997326254845


Train, Epoch 5 / 20:  34%|███▍      | 534/1563 [00:16<00:28, 36.22it/s]

batch 530 loss: 0.4219102293252945


Train, Epoch 5 / 20:  35%|███▍      | 546/1563 [00:17<00:28, 36.29it/s]

batch 540 loss: 0.5123472303152085


Train, Epoch 5 / 20:  35%|███▌      | 554/1563 [00:17<00:27, 36.21it/s]

batch 550 loss: 0.6212632924318313


Train, Epoch 5 / 20:  36%|███▌      | 566/1563 [00:17<00:27, 36.61it/s]

batch 560 loss: 0.427026829123497


Train, Epoch 5 / 20:  37%|███▋      | 574/1563 [00:17<00:27, 36.35it/s]

batch 570 loss: 0.5357737213373184


Train, Epoch 5 / 20:  37%|███▋      | 586/1563 [00:18<00:27, 35.02it/s]

batch 580 loss: 0.5038760095834732


Train, Epoch 5 / 20:  38%|███▊      | 594/1563 [00:18<00:27, 35.51it/s]

batch 590 loss: 0.5542424768209457


Train, Epoch 5 / 20:  39%|███▉      | 606/1563 [00:18<00:26, 36.19it/s]

batch 600 loss: 0.4531077057123184


Train, Epoch 5 / 20:  39%|███▉      | 614/1563 [00:19<00:26, 35.42it/s]

batch 610 loss: 0.5220688253641128


Train, Epoch 5 / 20:  40%|████      | 626/1563 [00:19<00:26, 35.59it/s]

batch 620 loss: 0.49405714869499207


Train, Epoch 5 / 20:  41%|████      | 634/1563 [00:19<00:26, 35.72it/s]

batch 630 loss: 0.5302862912416458


Train, Epoch 5 / 20:  41%|████▏     | 646/1563 [00:20<00:25, 35.48it/s]

batch 640 loss: 0.4402898371219635


Train, Epoch 5 / 20:  42%|████▏     | 654/1563 [00:20<00:25, 35.19it/s]

batch 650 loss: 0.5285621136426926


Train, Epoch 5 / 20:  43%|████▎     | 666/1563 [00:20<00:25, 35.76it/s]

batch 660 loss: 0.4414363980293274


Train, Epoch 5 / 20:  43%|████▎     | 674/1563 [00:20<00:24, 35.88it/s]

batch 670 loss: 0.46266413331031797


Train, Epoch 5 / 20:  44%|████▍     | 686/1563 [00:21<00:24, 36.14it/s]

batch 680 loss: 0.44354952275753023


Train, Epoch 5 / 20:  44%|████▍     | 694/1563 [00:21<00:23, 36.32it/s]

batch 690 loss: 0.4628113090991974


Train, Epoch 5 / 20:  45%|████▌     | 706/1563 [00:21<00:23, 36.69it/s]

batch 700 loss: 0.5158483117818833


Train, Epoch 5 / 20:  46%|████▌     | 714/1563 [00:21<00:23, 36.47it/s]

batch 710 loss: 0.5740375697612763


Train, Epoch 5 / 20:  46%|████▋     | 726/1563 [00:22<00:22, 36.48it/s]

batch 720 loss: 0.49969214498996734


Train, Epoch 5 / 20:  47%|████▋     | 734/1563 [00:22<00:22, 36.74it/s]

batch 730 loss: 0.5212650120258331


Train, Epoch 5 / 20:  48%|████▊     | 746/1563 [00:22<00:22, 36.79it/s]

batch 740 loss: 0.4913815498352051


Train, Epoch 5 / 20:  48%|████▊     | 754/1563 [00:22<00:22, 36.49it/s]

batch 750 loss: 0.5124791115522385


Train, Epoch 5 / 20:  49%|████▉     | 766/1563 [00:23<00:22, 35.61it/s]

batch 760 loss: 0.4564647749066353


Train, Epoch 5 / 20:  50%|████▉     | 774/1563 [00:23<00:22, 35.61it/s]

batch 770 loss: 0.468264439702034


Train, Epoch 5 / 20:  50%|█████     | 786/1563 [00:23<00:22, 35.18it/s]

batch 780 loss: 0.5008181601762771


Train, Epoch 5 / 20:  51%|█████     | 794/1563 [00:24<00:22, 34.62it/s]

batch 790 loss: 0.4765707224607468


Train, Epoch 5 / 20:  52%|█████▏    | 806/1563 [00:24<00:21, 34.86it/s]

batch 800 loss: 0.46916809380054475


Train, Epoch 5 / 20:  52%|█████▏    | 814/1563 [00:24<00:21, 35.12it/s]

batch 810 loss: 0.5162940829992294


Train, Epoch 5 / 20:  53%|█████▎    | 826/1563 [00:25<00:22, 33.38it/s]

batch 820 loss: 0.5126346811652184


Train, Epoch 5 / 20:  53%|█████▎    | 834/1563 [00:25<00:22, 32.50it/s]

batch 830 loss: 0.49946115612983705


Train, Epoch 5 / 20:  54%|█████▍    | 846/1563 [00:25<00:22, 32.43it/s]

batch 840 loss: 0.48712859451770785


Train, Epoch 5 / 20:  55%|█████▍    | 854/1563 [00:25<00:21, 32.31it/s]

batch 850 loss: 0.45593545734882357


Train, Epoch 5 / 20:  55%|█████▌    | 866/1563 [00:26<00:21, 32.20it/s]

batch 860 loss: 0.4968332529067993


Train, Epoch 5 / 20:  56%|█████▌    | 874/1563 [00:26<00:21, 32.35it/s]

batch 870 loss: 0.44498409926891325


Train, Epoch 5 / 20:  57%|█████▋    | 886/1563 [00:26<00:20, 32.68it/s]

batch 880 loss: 0.5554468333721161


Train, Epoch 5 / 20:  57%|█████▋    | 894/1563 [00:27<00:21, 31.60it/s]

batch 890 loss: 0.5345127612352372


Train, Epoch 5 / 20:  58%|█████▊    | 906/1563 [00:27<00:18, 34.64it/s]

batch 900 loss: 0.3906420975923538


Train, Epoch 5 / 20:  58%|█████▊    | 914/1563 [00:27<00:18, 35.32it/s]

batch 910 loss: 0.5007094621658326


Train, Epoch 5 / 20:  59%|█████▉    | 926/1563 [00:28<00:17, 35.80it/s]

batch 920 loss: 0.44751707911491395


Train, Epoch 5 / 20:  60%|█████▉    | 934/1563 [00:28<00:17, 35.52it/s]

batch 930 loss: 0.47313485145568845


Train, Epoch 5 / 20:  61%|██████    | 946/1563 [00:28<00:17, 35.13it/s]

batch 940 loss: 0.4916870027780533


Train, Epoch 5 / 20:  61%|██████    | 954/1563 [00:28<00:17, 35.30it/s]

batch 950 loss: 0.48031383454799653


Train, Epoch 5 / 20:  62%|██████▏   | 966/1563 [00:29<00:16, 35.43it/s]

batch 960 loss: 0.46709805727005005


Train, Epoch 5 / 20:  62%|██████▏   | 974/1563 [00:29<00:16, 35.21it/s]

batch 970 loss: 0.49470632076263427


Train, Epoch 5 / 20:  63%|██████▎   | 986/1563 [00:29<00:16, 35.81it/s]

batch 980 loss: 0.44960168898105624


Train, Epoch 5 / 20:  64%|██████▎   | 994/1563 [00:29<00:15, 36.13it/s]

batch 990 loss: 0.4303271025419235


Train, Epoch 5 / 20:  64%|██████▍   | 1006/1563 [00:30<00:15, 35.78it/s]

batch 1000 loss: 0.43587868213653563


Train, Epoch 5 / 20:  65%|██████▍   | 1014/1563 [00:30<00:15, 35.45it/s]

batch 1010 loss: 0.42970786690711976


Train, Epoch 5 / 20:  66%|██████▌   | 1026/1563 [00:30<00:15, 35.42it/s]

batch 1020 loss: 0.41481589078903197


Train, Epoch 5 / 20:  66%|██████▌   | 1034/1563 [00:31<00:14, 35.64it/s]

batch 1030 loss: 0.45847760438919066


Train, Epoch 5 / 20:  67%|██████▋   | 1046/1563 [00:31<00:14, 35.56it/s]

batch 1040 loss: 0.4277201682329178


Train, Epoch 5 / 20:  67%|██████▋   | 1054/1563 [00:31<00:14, 35.86it/s]

batch 1050 loss: 0.5100841134786606


Train, Epoch 5 / 20:  68%|██████▊   | 1066/1563 [00:32<00:13, 36.11it/s]

batch 1060 loss: 0.49898103773593905


Train, Epoch 5 / 20:  69%|██████▊   | 1074/1563 [00:32<00:13, 35.71it/s]

batch 1070 loss: 0.49885274171829225


Train, Epoch 5 / 20:  69%|██████▉   | 1086/1563 [00:32<00:13, 35.57it/s]

batch 1080 loss: 0.4972216784954071


Train, Epoch 5 / 20:  70%|██████▉   | 1094/1563 [00:32<00:13, 35.80it/s]

batch 1090 loss: 0.5234710067510605


Train, Epoch 5 / 20:  71%|███████   | 1106/1563 [00:33<00:12, 36.28it/s]

batch 1100 loss: 0.4879178166389465


Train, Epoch 5 / 20:  71%|███████▏  | 1114/1563 [00:33<00:12, 35.73it/s]

batch 1110 loss: 0.5092011794447899


Train, Epoch 5 / 20:  72%|███████▏  | 1126/1563 [00:33<00:12, 35.92it/s]

batch 1120 loss: 0.522504198551178


Train, Epoch 5 / 20:  73%|███████▎  | 1134/1563 [00:33<00:12, 35.72it/s]

batch 1130 loss: 0.48372054398059844


Train, Epoch 5 / 20:  73%|███████▎  | 1146/1563 [00:34<00:11, 35.58it/s]

batch 1140 loss: 0.48336014449596404


Train, Epoch 5 / 20:  74%|███████▍  | 1154/1563 [00:34<00:11, 35.73it/s]

batch 1150 loss: 0.4383611634373665


Train, Epoch 5 / 20:  75%|███████▍  | 1166/1563 [00:34<00:10, 36.15it/s]

batch 1160 loss: 0.3915285676717758


Train, Epoch 5 / 20:  75%|███████▌  | 1174/1563 [00:35<00:10, 35.78it/s]

batch 1170 loss: 0.5428364083170891


Train, Epoch 5 / 20:  76%|███████▌  | 1186/1563 [00:35<00:10, 35.31it/s]

batch 1180 loss: 0.46967832148075106


Train, Epoch 5 / 20:  76%|███████▋  | 1194/1563 [00:35<00:10, 36.11it/s]

batch 1190 loss: 0.4700766921043396


Train, Epoch 5 / 20:  77%|███████▋  | 1206/1563 [00:35<00:09, 35.79it/s]

batch 1200 loss: 0.4510915860533714


Train, Epoch 5 / 20:  78%|███████▊  | 1214/1563 [00:36<00:09, 35.87it/s]

batch 1210 loss: 0.4649913892149925


Train, Epoch 5 / 20:  78%|███████▊  | 1226/1563 [00:36<00:09, 36.18it/s]

batch 1220 loss: 0.46111405193805693


Train, Epoch 5 / 20:  79%|███████▉  | 1234/1563 [00:36<00:08, 36.57it/s]

batch 1230 loss: 0.4886495232582092


Train, Epoch 5 / 20:  80%|███████▉  | 1246/1563 [00:37<00:08, 36.68it/s]

batch 1240 loss: 0.4417837828397751


Train, Epoch 5 / 20:  80%|████████  | 1254/1563 [00:37<00:08, 36.05it/s]

batch 1250 loss: 0.47132517397403717


Train, Epoch 5 / 20:  81%|████████  | 1266/1563 [00:37<00:08, 33.68it/s]

batch 1260 loss: 0.5040221571922302


Train, Epoch 5 / 20:  82%|████████▏ | 1274/1563 [00:37<00:08, 33.19it/s]

batch 1270 loss: 0.4670365899801254


Train, Epoch 5 / 20:  82%|████████▏ | 1286/1563 [00:38<00:08, 32.75it/s]

batch 1280 loss: 0.4431053638458252


Train, Epoch 5 / 20:  83%|████████▎ | 1294/1563 [00:38<00:08, 32.97it/s]

batch 1290 loss: 0.47227691411972045


Train, Epoch 5 / 20:  84%|████████▎ | 1306/1563 [00:38<00:08, 32.04it/s]

batch 1300 loss: 0.44177992939949035


Train, Epoch 5 / 20:  84%|████████▍ | 1314/1563 [00:39<00:07, 32.71it/s]

batch 1310 loss: 0.5044869691133499


Train, Epoch 5 / 20:  85%|████████▍ | 1326/1563 [00:39<00:07, 32.50it/s]

batch 1320 loss: 0.4775142341852188


Train, Epoch 5 / 20:  85%|████████▌ | 1334/1563 [00:39<00:07, 32.63it/s]

batch 1330 loss: 0.4809853553771973


Train, Epoch 5 / 20:  86%|████████▌ | 1346/1563 [00:40<00:06, 34.69it/s]

batch 1340 loss: 0.4979224592447281


Train, Epoch 5 / 20:  87%|████████▋ | 1354/1563 [00:40<00:05, 35.66it/s]

batch 1350 loss: 0.5231496632099152


Train, Epoch 5 / 20:  87%|████████▋ | 1366/1563 [00:40<00:05, 36.36it/s]

batch 1360 loss: 0.5582440882921219


Train, Epoch 5 / 20:  88%|████████▊ | 1374/1563 [00:40<00:05, 36.38it/s]

batch 1370 loss: 0.4592572212219238


Train, Epoch 5 / 20:  89%|████████▊ | 1386/1563 [00:41<00:04, 36.58it/s]

batch 1380 loss: 0.5048351347446441


Train, Epoch 5 / 20:  89%|████████▉ | 1394/1563 [00:41<00:04, 36.45it/s]

batch 1390 loss: 0.502007532119751


Train, Epoch 5 / 20:  90%|████████▉ | 1406/1563 [00:41<00:04, 36.25it/s]

batch 1400 loss: 0.4943290963768959


Train, Epoch 5 / 20:  90%|█████████ | 1414/1563 [00:41<00:04, 36.46it/s]

batch 1410 loss: 0.5731282442808151


Train, Epoch 5 / 20:  91%|█████████ | 1426/1563 [00:42<00:03, 36.82it/s]

batch 1420 loss: 0.5252493530511856


Train, Epoch 5 / 20:  92%|█████████▏| 1434/1563 [00:42<00:03, 36.38it/s]

batch 1430 loss: 0.48405580818653104


Train, Epoch 5 / 20:  93%|█████████▎| 1446/1563 [00:42<00:03, 36.07it/s]

batch 1440 loss: 0.45444510877132416


Train, Epoch 5 / 20:  93%|█████████▎| 1454/1563 [00:43<00:02, 36.41it/s]

batch 1450 loss: 0.5158313512802124


Train, Epoch 5 / 20:  94%|█████████▍| 1466/1563 [00:43<00:02, 36.32it/s]

batch 1460 loss: 0.49048579335212705


Train, Epoch 5 / 20:  94%|█████████▍| 1474/1563 [00:43<00:02, 36.18it/s]

batch 1470 loss: 0.5078090906143189


Train, Epoch 5 / 20:  95%|█████████▌| 1486/1563 [00:43<00:02, 36.39it/s]

batch 1480 loss: 0.4669625490903854


Train, Epoch 5 / 20:  96%|█████████▌| 1494/1563 [00:44<00:01, 36.56it/s]

batch 1490 loss: 0.46170868575572965


Train, Epoch 5 / 20:  96%|█████████▋| 1506/1563 [00:44<00:01, 36.21it/s]

batch 1500 loss: 0.4338130861520767


Train, Epoch 5 / 20:  97%|█████████▋| 1514/1563 [00:44<00:01, 36.30it/s]

batch 1510 loss: 0.5849600940942764


Train, Epoch 5 / 20:  98%|█████████▊| 1526/1563 [00:45<00:01, 36.04it/s]

batch 1520 loss: 0.42668085992336274


Train, Epoch 5 / 20:  98%|█████████▊| 1534/1563 [00:45<00:00, 36.36it/s]

batch 1530 loss: 0.54906624853611


Train, Epoch 5 / 20:  99%|█████████▉| 1546/1563 [00:45<00:00, 35.63it/s]

batch 1540 loss: 0.5309282571077347


Train, Epoch 5 / 20:  99%|█████████▉| 1554/1563 [00:45<00:00, 35.40it/s]

batch 1550 loss: 0.5580377727746964


Train, Epoch 5 / 20: 100%|██████████| 1563/1563 [00:46<00:00, 33.94it/s]


batch 1560 loss: 0.44910692870616914


Test, Epoch 5 / 20: 100%|██████████| 1563/1563 [00:21<00:00, 74.07it/s]


Epoch 5, loss: 0.49441581277370455, accuracy: 0.7598


Train, Epoch 6 / 20:   1%|          | 16/1563 [00:00<00:43, 35.95it/s]

batch 10 loss: 0.42054054141044617


Train, Epoch 6 / 20:   2%|▏         | 24/1563 [00:00<00:42, 36.12it/s]

batch 20 loss: 0.44422261118888856


Train, Epoch 6 / 20:   2%|▏         | 36/1563 [00:01<00:42, 35.72it/s]

batch 30 loss: 0.5151442348957062


Train, Epoch 6 / 20:   3%|▎         | 44/1563 [00:01<00:42, 35.36it/s]

batch 40 loss: 0.45426720976829527


Train, Epoch 6 / 20:   4%|▎         | 56/1563 [00:01<00:41, 36.06it/s]

batch 50 loss: 0.39225268363952637


Train, Epoch 6 / 20:   4%|▍         | 64/1563 [00:01<00:41, 36.01it/s]

batch 60 loss: 0.5167060136795044


Train, Epoch 6 / 20:   5%|▍         | 76/1563 [00:02<00:40, 36.29it/s]

batch 70 loss: 0.47589507400989534


Train, Epoch 6 / 20:   5%|▌         | 84/1563 [00:02<00:41, 35.80it/s]

batch 80 loss: 0.5045936077833175


Train, Epoch 6 / 20:   6%|▌         | 96/1563 [00:02<00:40, 35.81it/s]

batch 90 loss: 0.4448930621147156


Train, Epoch 6 / 20:   7%|▋         | 104/1563 [00:02<00:40, 36.01it/s]

batch 100 loss: 0.48480409681797026


Train, Epoch 6 / 20:   7%|▋         | 116/1563 [00:03<00:40, 35.59it/s]

batch 110 loss: 0.4641663029789925


Train, Epoch 6 / 20:   8%|▊         | 124/1563 [00:03<00:40, 35.58it/s]

batch 120 loss: 0.34360726475715636


Train, Epoch 6 / 20:   9%|▊         | 136/1563 [00:03<00:39, 36.29it/s]

batch 130 loss: 0.6457798957824707


Train, Epoch 6 / 20:   9%|▉         | 144/1563 [00:04<00:39, 36.36it/s]

batch 140 loss: 0.4124087929725647


Train, Epoch 6 / 20:  10%|▉         | 156/1563 [00:04<00:39, 35.65it/s]

batch 150 loss: 0.4528817892074585


Train, Epoch 6 / 20:  10%|█         | 164/1563 [00:04<00:39, 35.28it/s]

batch 160 loss: 0.5023148059844971


Train, Epoch 6 / 20:  11%|█▏        | 176/1563 [00:04<00:38, 35.75it/s]

batch 170 loss: 0.48200244307518003


Train, Epoch 6 / 20:  12%|█▏        | 184/1563 [00:05<00:38, 36.07it/s]

batch 180 loss: 0.47561340034008026


Train, Epoch 6 / 20:  13%|█▎        | 196/1563 [00:05<00:38, 35.81it/s]

batch 190 loss: 0.5057515561580658


Train, Epoch 6 / 20:  13%|█▎        | 204/1563 [00:05<00:37, 36.28it/s]

batch 200 loss: 0.434519550204277


Train, Epoch 6 / 20:  14%|█▍        | 216/1563 [00:06<00:37, 35.58it/s]

batch 210 loss: 0.46388896703720095


Train, Epoch 6 / 20:  14%|█▍        | 224/1563 [00:06<00:37, 36.01it/s]

batch 220 loss: 0.5284201830625535


Train, Epoch 6 / 20:  15%|█▌        | 236/1563 [00:06<00:36, 36.30it/s]

batch 230 loss: 0.4469392612576485


Train, Epoch 6 / 20:  16%|█▌        | 244/1563 [00:06<00:35, 36.77it/s]

batch 240 loss: 0.45586475282907485


Train, Epoch 6 / 20:  16%|█▋        | 256/1563 [00:07<00:37, 34.85it/s]

batch 250 loss: 0.41563750654459


Train, Epoch 6 / 20:  17%|█▋        | 264/1563 [00:07<00:39, 32.92it/s]

batch 260 loss: 0.47365535497665406


Train, Epoch 6 / 20:  18%|█▊        | 276/1563 [00:07<00:38, 33.28it/s]

batch 270 loss: 0.4870681673288345


Train, Epoch 6 / 20:  18%|█▊        | 284/1563 [00:07<00:38, 32.96it/s]

batch 280 loss: 0.4548104047775269


Train, Epoch 6 / 20:  19%|█▉        | 296/1563 [00:08<00:39, 32.18it/s]

batch 290 loss: 0.41620550453662875


Train, Epoch 6 / 20:  19%|█▉        | 304/1563 [00:08<00:39, 31.67it/s]

batch 300 loss: 0.4895533561706543


Train, Epoch 6 / 20:  20%|██        | 316/1563 [00:09<00:40, 31.02it/s]

batch 310 loss: 0.4902148127555847


Train, Epoch 6 / 20:  21%|██        | 324/1563 [00:09<00:40, 30.92it/s]

batch 320 loss: 0.5014690726995468


Train, Epoch 6 / 20:  21%|██▏       | 336/1563 [00:09<00:35, 34.11it/s]

batch 330 loss: 0.4697213679552078


Train, Epoch 6 / 20:  22%|██▏       | 344/1563 [00:09<00:34, 35.07it/s]

batch 340 loss: 0.5004591345787048


Train, Epoch 6 / 20:  23%|██▎       | 356/1563 [00:10<00:33, 36.18it/s]

batch 350 loss: 0.5370377004146576


Train, Epoch 6 / 20:  23%|██▎       | 364/1563 [00:10<00:33, 36.27it/s]

batch 360 loss: 0.4552248477935791


Train, Epoch 6 / 20:  24%|██▍       | 376/1563 [00:10<00:32, 36.36it/s]

batch 370 loss: 0.49869694113731383


Train, Epoch 6 / 20:  25%|██▍       | 384/1563 [00:10<00:32, 36.66it/s]

batch 380 loss: 0.4642520219087601


Train, Epoch 6 / 20:  25%|██▌       | 396/1563 [00:11<00:32, 36.44it/s]

batch 390 loss: 0.4746613383293152


Train, Epoch 6 / 20:  26%|██▌       | 404/1563 [00:11<00:32, 36.11it/s]

batch 400 loss: 0.4731251299381256


Train, Epoch 6 / 20:  27%|██▋       | 416/1563 [00:11<00:31, 36.06it/s]

batch 410 loss: 0.4434158354997635


Train, Epoch 6 / 20:  27%|██▋       | 424/1563 [00:12<00:31, 35.82it/s]

batch 420 loss: 0.481814569234848


Train, Epoch 6 / 20:  28%|██▊       | 436/1563 [00:12<00:30, 36.40it/s]

batch 430 loss: 0.40317895710468293


Train, Epoch 6 / 20:  28%|██▊       | 444/1563 [00:12<00:31, 35.57it/s]

batch 440 loss: 0.479682257771492


Train, Epoch 6 / 20:  29%|██▉       | 456/1563 [00:12<00:31, 35.33it/s]

batch 450 loss: 0.4734911471605301


Train, Epoch 6 / 20:  30%|██▉       | 464/1563 [00:13<00:31, 35.10it/s]

batch 460 loss: 0.4803382933139801


Train, Epoch 6 / 20:  30%|███       | 476/1563 [00:13<00:30, 35.57it/s]

batch 470 loss: 0.42135116159915925


Train, Epoch 6 / 20:  31%|███       | 484/1563 [00:13<00:30, 35.35it/s]

batch 480 loss: 0.5105252087116241


Train, Epoch 6 / 20:  32%|███▏      | 496/1563 [00:14<00:29, 35.97it/s]

batch 490 loss: 0.4655339360237122


Train, Epoch 6 / 20:  32%|███▏      | 504/1563 [00:14<00:29, 36.46it/s]

batch 500 loss: 0.4558369368314743


Train, Epoch 6 / 20:  33%|███▎      | 516/1563 [00:14<00:28, 36.18it/s]

batch 510 loss: 0.49602687954902647


Train, Epoch 6 / 20:  34%|███▎      | 524/1563 [00:14<00:28, 36.55it/s]

batch 520 loss: 0.5532845795154572


Train, Epoch 6 / 20:  34%|███▍      | 536/1563 [00:15<00:28, 36.55it/s]

batch 530 loss: 0.4440539062023163


Train, Epoch 6 / 20:  35%|███▍      | 544/1563 [00:15<00:27, 36.56it/s]

batch 540 loss: 0.47560946345329286


Train, Epoch 6 / 20:  36%|███▌      | 556/1563 [00:15<00:27, 35.96it/s]

batch 550 loss: 0.4400563731789589


Train, Epoch 6 / 20:  36%|███▌      | 564/1563 [00:15<00:27, 36.06it/s]

batch 560 loss: 0.4706175237894058


Train, Epoch 6 / 20:  37%|███▋      | 576/1563 [00:16<00:27, 36.50it/s]

batch 570 loss: 0.42470984905958176


Train, Epoch 6 / 20:  37%|███▋      | 584/1563 [00:16<00:26, 36.45it/s]

batch 580 loss: 0.4547666072845459


Train, Epoch 6 / 20:  38%|███▊      | 596/1563 [00:16<00:26, 35.92it/s]

batch 590 loss: 0.4622296929359436


Train, Epoch 6 / 20:  39%|███▊      | 604/1563 [00:17<00:26, 35.54it/s]

batch 600 loss: 0.5392166525125504


Train, Epoch 6 / 20:  39%|███▉      | 616/1563 [00:17<00:26, 36.14it/s]

batch 610 loss: 0.4479658991098404


Train, Epoch 6 / 20:  40%|███▉      | 624/1563 [00:17<00:26, 35.98it/s]

batch 620 loss: 0.5135715037584305


Train, Epoch 6 / 20:  41%|████      | 636/1563 [00:17<00:25, 35.74it/s]

batch 630 loss: 0.46940533220767977


Train, Epoch 6 / 20:  41%|████      | 644/1563 [00:18<00:25, 36.03it/s]

batch 640 loss: 0.5445496052503586


Train, Epoch 6 / 20:  42%|████▏     | 656/1563 [00:18<00:24, 36.30it/s]

batch 650 loss: 0.4492963194847107


Train, Epoch 6 / 20:  42%|████▏     | 664/1563 [00:18<00:25, 35.76it/s]

batch 660 loss: 0.5032896280288697


Train, Epoch 6 / 20:  43%|████▎     | 676/1563 [00:19<00:24, 35.75it/s]

batch 670 loss: 0.4606612592935562


Train, Epoch 6 / 20:  44%|████▍     | 684/1563 [00:19<00:24, 35.84it/s]

batch 680 loss: 0.46002599596977234


Train, Epoch 6 / 20:  45%|████▍     | 696/1563 [00:19<00:26, 32.86it/s]

batch 690 loss: 0.4249632120132446


Train, Epoch 6 / 20:  45%|████▌     | 704/1563 [00:19<00:26, 32.36it/s]

batch 700 loss: 0.43225671350955963


Train, Epoch 6 / 20:  46%|████▌     | 716/1563 [00:20<00:26, 32.02it/s]

batch 710 loss: 0.5049399822950363


Train, Epoch 6 / 20:  46%|████▋     | 724/1563 [00:20<00:25, 32.36it/s]

batch 720 loss: 0.43294338434934615


Train, Epoch 6 / 20:  47%|████▋     | 736/1563 [00:20<00:25, 32.53it/s]

batch 730 loss: 0.42326071560382844


Train, Epoch 6 / 20:  48%|████▊     | 744/1563 [00:21<00:25, 32.31it/s]

batch 740 loss: 0.5564047247171402


Train, Epoch 6 / 20:  48%|████▊     | 756/1563 [00:21<00:24, 32.52it/s]

batch 750 loss: 0.5392902493476868


Train, Epoch 6 / 20:  49%|████▉     | 764/1563 [00:21<00:23, 34.32it/s]

batch 760 loss: 0.4046248227357864


Train, Epoch 6 / 20:  50%|████▉     | 776/1563 [00:22<00:22, 35.01it/s]

batch 770 loss: 0.5019261449575424


Train, Epoch 6 / 20:  50%|█████     | 784/1563 [00:22<00:21, 35.78it/s]

batch 780 loss: 0.4028895303606987


Train, Epoch 6 / 20:  51%|█████     | 796/1563 [00:22<00:21, 35.81it/s]

batch 790 loss: 0.5126371204853057


Train, Epoch 6 / 20:  51%|█████▏    | 804/1563 [00:22<00:21, 35.72it/s]

batch 800 loss: 0.5179834306240082


Train, Epoch 6 / 20:  52%|█████▏    | 816/1563 [00:23<00:20, 35.96it/s]

batch 810 loss: 0.47337140142917633


Train, Epoch 6 / 20:  53%|█████▎    | 824/1563 [00:23<00:20, 36.10it/s]

batch 820 loss: 0.4298726111650467


Train, Epoch 6 / 20:  53%|█████▎    | 836/1563 [00:23<00:20, 35.68it/s]

batch 830 loss: 0.5301953226327896


Train, Epoch 6 / 20:  54%|█████▍    | 844/1563 [00:23<00:20, 35.40it/s]

batch 840 loss: 0.4740995854139328


Train, Epoch 6 / 20:  55%|█████▍    | 856/1563 [00:24<00:19, 36.04it/s]

batch 850 loss: 0.4546578139066696


Train, Epoch 6 / 20:  55%|█████▌    | 864/1563 [00:24<00:19, 36.50it/s]

batch 860 loss: 0.4535566419363022


Train, Epoch 6 / 20:  56%|█████▌    | 876/1563 [00:24<00:19, 36.14it/s]

batch 870 loss: 0.4401379287242889


Train, Epoch 6 / 20:  57%|█████▋    | 884/1563 [00:25<00:18, 36.33it/s]

batch 880 loss: 0.5184504002332687


Train, Epoch 6 / 20:  57%|█████▋    | 896/1563 [00:25<00:18, 36.10it/s]

batch 890 loss: 0.4886853188276291


Train, Epoch 6 / 20:  58%|█████▊    | 904/1563 [00:25<00:18, 35.96it/s]

batch 900 loss: 0.4320610523223877


Train, Epoch 6 / 20:  59%|█████▊    | 916/1563 [00:25<00:17, 36.30it/s]

batch 910 loss: 0.4324468642473221


Train, Epoch 6 / 20:  59%|█████▉    | 924/1563 [00:26<00:17, 36.49it/s]

batch 920 loss: 0.5616074144840241


Train, Epoch 6 / 20:  60%|█████▉    | 936/1563 [00:26<00:17, 36.50it/s]

batch 930 loss: 0.5019977003335953


Train, Epoch 6 / 20:  60%|██████    | 944/1563 [00:26<00:16, 36.66it/s]

batch 940 loss: 0.5278231829404831


Train, Epoch 6 / 20:  61%|██████    | 956/1563 [00:27<00:16, 36.38it/s]

batch 950 loss: 0.40880411565303804


Train, Epoch 6 / 20:  62%|██████▏   | 964/1563 [00:27<00:16, 36.41it/s]

batch 960 loss: 0.4371291518211365


Train, Epoch 6 / 20:  62%|██████▏   | 976/1563 [00:27<00:15, 36.83it/s]

batch 970 loss: 0.496764212846756


Train, Epoch 6 / 20:  63%|██████▎   | 984/1563 [00:27<00:15, 36.74it/s]

batch 980 loss: 0.45200625658035276


Train, Epoch 6 / 20:  64%|██████▎   | 996/1563 [00:28<00:15, 36.34it/s]

batch 990 loss: 0.42582816779613497


Train, Epoch 6 / 20:  64%|██████▍   | 1004/1563 [00:28<00:15, 36.72it/s]

batch 1000 loss: 0.4713982120156288


Train, Epoch 6 / 20:  65%|██████▌   | 1016/1563 [00:28<00:15, 36.36it/s]

batch 1010 loss: 0.4116168260574341


Train, Epoch 6 / 20:  66%|██████▌   | 1024/1563 [00:28<00:14, 36.53it/s]

batch 1020 loss: 0.47818115949630735


Train, Epoch 6 / 20:  66%|██████▋   | 1036/1563 [00:29<00:14, 36.24it/s]

batch 1030 loss: 0.48872706294059753


Train, Epoch 6 / 20:  67%|██████▋   | 1044/1563 [00:29<00:14, 36.66it/s]

batch 1040 loss: 0.5018874615430832


Train, Epoch 6 / 20:  68%|██████▊   | 1056/1563 [00:29<00:13, 36.59it/s]

batch 1050 loss: 0.47355779111385343


Train, Epoch 6 / 20:  68%|██████▊   | 1064/1563 [00:30<00:13, 36.38it/s]

batch 1060 loss: 0.43258305490016935


Train, Epoch 6 / 20:  69%|██████▉   | 1076/1563 [00:30<00:13, 36.12it/s]

batch 1070 loss: 0.5100795686244964


Train, Epoch 6 / 20:  69%|██████▉   | 1084/1563 [00:30<00:13, 36.68it/s]

batch 1080 loss: 0.46589629650115966


Train, Epoch 6 / 20:  70%|███████   | 1096/1563 [00:30<00:12, 36.56it/s]

batch 1090 loss: 0.5620923072099686


Train, Epoch 6 / 20:  71%|███████   | 1104/1563 [00:31<00:12, 35.61it/s]

batch 1100 loss: 0.4418420523405075


Train, Epoch 6 / 20:  71%|███████▏  | 1116/1563 [00:31<00:12, 35.87it/s]

batch 1110 loss: 0.41296959668397903


Train, Epoch 6 / 20:  72%|███████▏  | 1124/1563 [00:31<00:12, 34.01it/s]

batch 1120 loss: 0.43898507952690125


Train, Epoch 6 / 20:  73%|███████▎  | 1136/1563 [00:32<00:13, 32.49it/s]

batch 1130 loss: 0.4421011686325073


Train, Epoch 6 / 20:  73%|███████▎  | 1144/1563 [00:32<00:13, 32.02it/s]

batch 1140 loss: 0.45522828549146654


Train, Epoch 6 / 20:  74%|███████▍  | 1156/1563 [00:32<00:12, 32.62it/s]

batch 1150 loss: 0.37799866795539855


Train, Epoch 6 / 20:  74%|███████▍  | 1164/1563 [00:32<00:12, 32.86it/s]

batch 1160 loss: 0.4207294464111328


Train, Epoch 6 / 20:  75%|███████▌  | 1176/1563 [00:33<00:12, 32.15it/s]

batch 1170 loss: 0.4744372308254242


Train, Epoch 6 / 20:  76%|███████▌  | 1184/1563 [00:33<00:11, 31.60it/s]

batch 1180 loss: 0.4901563376188278


Train, Epoch 6 / 20:  77%|███████▋  | 1196/1563 [00:33<00:11, 33.03it/s]

batch 1190 loss: 0.4411690294742584


Train, Epoch 6 / 20:  77%|███████▋  | 1204/1563 [00:34<00:10, 33.82it/s]

batch 1200 loss: 0.5201652765274047


Train, Epoch 6 / 20:  78%|███████▊  | 1216/1563 [00:34<00:09, 35.33it/s]

batch 1210 loss: 0.47946925461292267


Train, Epoch 6 / 20:  78%|███████▊  | 1224/1563 [00:34<00:09, 35.10it/s]

batch 1220 loss: 0.5473014533519744


Train, Epoch 6 / 20:  79%|███████▉  | 1236/1563 [00:35<00:09, 35.62it/s]

batch 1230 loss: 0.41463176906108856


Train, Epoch 6 / 20:  80%|███████▉  | 1244/1563 [00:35<00:09, 35.39it/s]

batch 1240 loss: 0.44666742384433744


Train, Epoch 6 / 20:  80%|████████  | 1256/1563 [00:35<00:08, 35.93it/s]

batch 1250 loss: 0.4742770463228226


Train, Epoch 6 / 20:  81%|████████  | 1264/1563 [00:35<00:08, 36.12it/s]

batch 1260 loss: 0.47383566200733185


Train, Epoch 6 / 20:  82%|████████▏ | 1276/1563 [00:36<00:07, 36.03it/s]

batch 1270 loss: 0.41572139263153074


Train, Epoch 6 / 20:  82%|████████▏ | 1284/1563 [00:36<00:07, 35.94it/s]

batch 1280 loss: 0.4526951164007187


Train, Epoch 6 / 20:  83%|████████▎ | 1296/1563 [00:36<00:07, 36.62it/s]

batch 1290 loss: 0.44739688038825987


Train, Epoch 6 / 20:  83%|████████▎ | 1304/1563 [00:36<00:07, 36.34it/s]

batch 1300 loss: 0.43812852948904035


Train, Epoch 6 / 20:  84%|████████▍ | 1316/1563 [00:37<00:06, 36.76it/s]

batch 1310 loss: 0.4269104838371277


Train, Epoch 6 / 20:  85%|████████▍ | 1324/1563 [00:37<00:06, 36.69it/s]

batch 1320 loss: 0.4238851860165596


Train, Epoch 6 / 20:  85%|████████▌ | 1336/1563 [00:37<00:06, 36.60it/s]

batch 1330 loss: 0.4243769943714142


Train, Epoch 6 / 20:  86%|████████▌ | 1344/1563 [00:38<00:06, 36.46it/s]

batch 1340 loss: 0.44261320531368253


Train, Epoch 6 / 20:  87%|████████▋ | 1356/1563 [00:38<00:05, 36.26it/s]

batch 1350 loss: 0.4464004188776016


Train, Epoch 6 / 20:  87%|████████▋ | 1364/1563 [00:38<00:05, 36.56it/s]

batch 1360 loss: 0.4474951773881912


Train, Epoch 6 / 20:  88%|████████▊ | 1376/1563 [00:38<00:05, 36.60it/s]

batch 1370 loss: 0.43196197748184206


Train, Epoch 6 / 20:  89%|████████▊ | 1384/1563 [00:39<00:04, 36.48it/s]

batch 1380 loss: 0.4284885793924332


Train, Epoch 6 / 20:  89%|████████▉ | 1396/1563 [00:39<00:04, 35.88it/s]

batch 1390 loss: 0.45477727651596067


Train, Epoch 6 / 20:  90%|████████▉ | 1404/1563 [00:39<00:04, 36.25it/s]

batch 1400 loss: 0.4654898554086685


Train, Epoch 6 / 20:  91%|█████████ | 1416/1563 [00:40<00:04, 36.39it/s]

batch 1410 loss: 0.475249645113945


Train, Epoch 6 / 20:  91%|█████████ | 1424/1563 [00:40<00:03, 36.18it/s]

batch 1420 loss: 0.4462019145488739


Train, Epoch 6 / 20:  92%|█████████▏| 1436/1563 [00:40<00:03, 36.03it/s]

batch 1430 loss: 0.5010514289140702


Train, Epoch 6 / 20:  92%|█████████▏| 1444/1563 [00:40<00:03, 36.24it/s]

batch 1440 loss: 0.5104643136262894


Train, Epoch 6 / 20:  93%|█████████▎| 1456/1563 [00:41<00:02, 36.33it/s]

batch 1450 loss: 0.43215687572956085


Train, Epoch 6 / 20:  94%|█████████▎| 1464/1563 [00:41<00:02, 35.24it/s]

batch 1460 loss: 0.42504705488681793


Train, Epoch 6 / 20:  94%|█████████▍| 1476/1563 [00:41<00:02, 35.94it/s]

batch 1470 loss: 0.46909177452325823


Train, Epoch 6 / 20:  95%|█████████▍| 1484/1563 [00:41<00:02, 36.21it/s]

batch 1480 loss: 0.5617788106203079


Train, Epoch 6 / 20:  96%|█████████▌| 1496/1563 [00:42<00:01, 35.96it/s]

batch 1490 loss: 0.46933750808238983


Train, Epoch 6 / 20:  96%|█████████▌| 1504/1563 [00:42<00:01, 36.07it/s]

batch 1500 loss: 0.5197392016649246


Train, Epoch 6 / 20:  97%|█████████▋| 1516/1563 [00:42<00:01, 36.18it/s]

batch 1510 loss: 0.4377260833978653


Train, Epoch 6 / 20:  98%|█████████▊| 1524/1563 [00:43<00:01, 36.27it/s]

batch 1520 loss: 0.44949260354042053


Train, Epoch 6 / 20:  98%|█████████▊| 1536/1563 [00:43<00:00, 36.44it/s]

batch 1530 loss: 0.4317566841840744


Train, Epoch 6 / 20:  99%|█████████▉| 1544/1563 [00:43<00:00, 36.96it/s]

batch 1540 loss: 0.48109619319438934


Train, Epoch 6 / 20: 100%|█████████▉| 1556/1563 [00:43<00:00, 35.72it/s]

batch 1550 loss: 0.4089661180973053


Train, Epoch 6 / 20: 100%|██████████| 1563/1563 [00:44<00:00, 35.41it/s]


batch 1560 loss: 0.4822920024394989


Test, Epoch 6 / 20: 100%|██████████| 1563/1563 [00:21<00:00, 74.24it/s]


Epoch 6, loss: 0.4949911723470688, accuracy: 0.76156


Train, Epoch 7 / 20:   1%|          | 16/1563 [00:00<00:42, 36.35it/s]

batch 10 loss: 0.5040135741233825


Train, Epoch 7 / 20:   2%|▏         | 24/1563 [00:00<00:42, 36.33it/s]

batch 20 loss: 0.5068443670868874


Train, Epoch 7 / 20:   2%|▏         | 36/1563 [00:00<00:41, 36.37it/s]

batch 30 loss: 0.39566386938095094


Train, Epoch 7 / 20:   3%|▎         | 44/1563 [00:01<00:42, 36.03it/s]

batch 40 loss: 0.47265611588954926


Train, Epoch 7 / 20:   4%|▎         | 56/1563 [00:01<00:40, 36.85it/s]

batch 50 loss: 0.47968657314777374


Train, Epoch 7 / 20:   4%|▍         | 64/1563 [00:01<00:41, 36.52it/s]

batch 60 loss: 0.5192603021860123


Train, Epoch 7 / 20:   5%|▍         | 76/1563 [00:02<00:40, 36.62it/s]

batch 70 loss: 0.41308975517749785


Train, Epoch 7 / 20:   5%|▌         | 84/1563 [00:02<00:40, 36.49it/s]

batch 80 loss: 0.4604485809803009


Train, Epoch 7 / 20:   6%|▌         | 96/1563 [00:02<00:40, 36.63it/s]

batch 90 loss: 0.39242657721042634


Train, Epoch 7 / 20:   7%|▋         | 104/1563 [00:02<00:40, 36.28it/s]

batch 100 loss: 0.38559460639953613


Train, Epoch 7 / 20:   7%|▋         | 116/1563 [00:03<00:40, 36.05it/s]

batch 110 loss: 0.5015076577663422


Train, Epoch 7 / 20:   8%|▊         | 124/1563 [00:03<00:40, 35.24it/s]

batch 120 loss: 0.44811734855175017


Train, Epoch 7 / 20:   9%|▊         | 136/1563 [00:03<00:42, 33.31it/s]

batch 130 loss: 0.4314290419220924


Train, Epoch 7 / 20:   9%|▉         | 144/1563 [00:04<00:44, 31.59it/s]

batch 140 loss: 0.42670887112617495


Train, Epoch 7 / 20:  10%|▉         | 156/1563 [00:04<00:45, 31.01it/s]

batch 150 loss: 0.48121780157089233


Train, Epoch 7 / 20:  10%|█         | 164/1563 [00:04<00:45, 30.92it/s]

batch 160 loss: 0.4752449423074722


Train, Epoch 7 / 20:  11%|█▏        | 176/1563 [00:05<00:44, 31.48it/s]

batch 170 loss: 0.41167476773262024


Train, Epoch 7 / 20:  12%|█▏        | 184/1563 [00:05<00:43, 31.65it/s]

batch 180 loss: 0.48412847220897676


Train, Epoch 7 / 20:  13%|█▎        | 196/1563 [00:05<00:43, 31.46it/s]

batch 190 loss: 0.5021188765764236


Train, Epoch 7 / 20:  13%|█▎        | 204/1563 [00:05<00:40, 33.68it/s]

batch 200 loss: 0.4857257679104805


Train, Epoch 7 / 20:  14%|█▍        | 216/1563 [00:06<00:37, 35.53it/s]

batch 210 loss: 0.3902562603354454


Train, Epoch 7 / 20:  14%|█▍        | 224/1563 [00:06<00:36, 36.26it/s]

batch 220 loss: 0.43694859445095063


Train, Epoch 7 / 20:  15%|█▌        | 236/1563 [00:06<00:36, 36.30it/s]

batch 230 loss: 0.43198132365942


Train, Epoch 7 / 20:  16%|█▌        | 244/1563 [00:07<00:36, 36.20it/s]

batch 240 loss: 0.41531542539596555


Train, Epoch 7 / 20:  16%|█▋        | 256/1563 [00:07<00:35, 36.57it/s]

batch 250 loss: 0.5041398704051971


Train, Epoch 7 / 20:  17%|█▋        | 264/1563 [00:07<00:35, 36.65it/s]

batch 260 loss: 0.4035138338804245


Train, Epoch 7 / 20:  18%|█▊        | 276/1563 [00:07<00:35, 36.63it/s]

batch 270 loss: 0.466679984331131


Train, Epoch 7 / 20:  18%|█▊        | 284/1563 [00:08<00:35, 35.89it/s]

batch 280 loss: 0.4698103994131088


Train, Epoch 7 / 20:  19%|█▉        | 296/1563 [00:08<00:34, 36.54it/s]

batch 290 loss: 0.4682604521512985


Train, Epoch 7 / 20:  19%|█▉        | 304/1563 [00:08<00:34, 36.35it/s]

batch 300 loss: 0.4964447617530823


Train, Epoch 7 / 20:  20%|██        | 316/1563 [00:09<00:34, 35.86it/s]

batch 310 loss: 0.37748060971498487


Train, Epoch 7 / 20:  21%|██        | 324/1563 [00:09<00:34, 35.63it/s]

batch 320 loss: 0.4436478018760681


Train, Epoch 7 / 20:  21%|██▏       | 336/1563 [00:09<00:33, 36.18it/s]

batch 330 loss: 0.38744298219680784


Train, Epoch 7 / 20:  22%|██▏       | 344/1563 [00:09<00:33, 36.36it/s]

batch 340 loss: 0.45724948644638064


Train, Epoch 7 / 20:  23%|██▎       | 356/1563 [00:10<00:33, 36.13it/s]

batch 350 loss: 0.5822031825780869


Train, Epoch 7 / 20:  23%|██▎       | 364/1563 [00:10<00:33, 36.26it/s]

batch 360 loss: 0.40423795729875567


Train, Epoch 7 / 20:  24%|██▍       | 376/1563 [00:10<00:32, 36.42it/s]

batch 370 loss: 0.3899067550897598


Train, Epoch 7 / 20:  25%|██▍       | 384/1563 [00:10<00:32, 35.91it/s]

batch 380 loss: 0.47314642816782


Train, Epoch 7 / 20:  25%|██▌       | 396/1563 [00:11<00:32, 35.68it/s]

batch 390 loss: 0.46760382652282717


Train, Epoch 7 / 20:  26%|██▌       | 404/1563 [00:11<00:31, 36.47it/s]

batch 400 loss: 0.44451669603586197


Train, Epoch 7 / 20:  27%|██▋       | 416/1563 [00:11<00:31, 36.15it/s]

batch 410 loss: 0.4122251093387604


Train, Epoch 7 / 20:  27%|██▋       | 424/1563 [00:12<00:31, 36.44it/s]

batch 420 loss: 0.4178639858961105


Train, Epoch 7 / 20:  28%|██▊       | 436/1563 [00:12<00:31, 35.88it/s]

batch 430 loss: 0.42761376202106477


Train, Epoch 7 / 20:  28%|██▊       | 444/1563 [00:12<00:31, 35.89it/s]

batch 440 loss: 0.40189904570579527


Train, Epoch 7 / 20:  29%|██▉       | 456/1563 [00:12<00:30, 36.22it/s]

batch 450 loss: 0.5313367545604706


Train, Epoch 7 / 20:  30%|██▉       | 464/1563 [00:13<00:30, 35.95it/s]

batch 460 loss: 0.41266406774520875


Train, Epoch 7 / 20:  30%|███       | 476/1563 [00:13<00:30, 36.05it/s]

batch 470 loss: 0.4857044965028763


Train, Epoch 7 / 20:  31%|███       | 484/1563 [00:13<00:29, 36.22it/s]

batch 480 loss: 0.46765309274196626


Train, Epoch 7 / 20:  32%|███▏      | 496/1563 [00:14<00:29, 36.35it/s]

batch 490 loss: 0.4955617368221283


Train, Epoch 7 / 20:  32%|███▏      | 504/1563 [00:14<00:29, 35.80it/s]

batch 500 loss: 0.4394949465990067


Train, Epoch 7 / 20:  33%|███▎      | 516/1563 [00:14<00:29, 35.35it/s]

batch 510 loss: 0.47109163403511045


Train, Epoch 7 / 20:  34%|███▎      | 524/1563 [00:14<00:29, 35.63it/s]

batch 520 loss: 0.49250557124614713


Train, Epoch 7 / 20:  34%|███▍      | 536/1563 [00:15<00:28, 35.48it/s]

batch 530 loss: 0.4276806801557541


Train, Epoch 7 / 20:  35%|███▍      | 544/1563 [00:15<00:28, 36.03it/s]

batch 540 loss: 0.43238343596458434


Train, Epoch 7 / 20:  36%|███▌      | 556/1563 [00:15<00:28, 35.82it/s]

batch 550 loss: 0.532613092660904


Train, Epoch 7 / 20:  36%|███▌      | 564/1563 [00:15<00:29, 34.22it/s]

batch 560 loss: 0.42378163486719134


Train, Epoch 7 / 20:  37%|███▋      | 576/1563 [00:16<00:30, 32.81it/s]

batch 570 loss: 0.4330462023615837


Train, Epoch 7 / 20:  37%|███▋      | 584/1563 [00:16<00:30, 31.82it/s]

batch 580 loss: 0.476416876912117


Train, Epoch 7 / 20:  38%|███▊      | 596/1563 [00:16<00:29, 32.27it/s]

batch 590 loss: 0.5586168736219406


Train, Epoch 7 / 20:  39%|███▊      | 604/1563 [00:17<00:29, 32.56it/s]

batch 600 loss: 0.5264455527067184


Train, Epoch 7 / 20:  39%|███▉      | 616/1563 [00:17<00:29, 32.48it/s]

batch 610 loss: 0.40497414767742157


Train, Epoch 7 / 20:  40%|███▉      | 624/1563 [00:17<00:29, 31.96it/s]

batch 620 loss: 0.506177531182766


Train, Epoch 7 / 20:  41%|████      | 636/1563 [00:18<00:27, 34.13it/s]

batch 630 loss: 0.45294331312179564


Train, Epoch 7 / 20:  41%|████      | 644/1563 [00:18<00:26, 34.88it/s]

batch 640 loss: 0.49863320887088775


Train, Epoch 7 / 20:  42%|████▏     | 656/1563 [00:18<00:25, 35.99it/s]

batch 650 loss: 0.5029049098491669


Train, Epoch 7 / 20:  42%|████▏     | 664/1563 [00:18<00:25, 35.88it/s]

batch 660 loss: 0.39328323900699613


Train, Epoch 7 / 20:  43%|████▎     | 676/1563 [00:19<00:24, 35.79it/s]

batch 670 loss: 0.443790203332901


Train, Epoch 7 / 20:  44%|████▍     | 684/1563 [00:19<00:24, 35.79it/s]

batch 680 loss: 0.41110112220048906


Train, Epoch 7 / 20:  45%|████▍     | 696/1563 [00:19<00:23, 36.28it/s]

batch 690 loss: 0.4008347183465958


Train, Epoch 7 / 20:  45%|████▌     | 704/1563 [00:20<00:23, 36.30it/s]

batch 700 loss: 0.47994457334280016


Train, Epoch 7 / 20:  46%|████▌     | 716/1563 [00:20<00:23, 36.50it/s]

batch 710 loss: 0.41684250980615617


Train, Epoch 7 / 20:  46%|████▋     | 724/1563 [00:20<00:22, 36.71it/s]

batch 720 loss: 0.48474189043045046


Train, Epoch 7 / 20:  47%|████▋     | 736/1563 [00:20<00:22, 36.76it/s]

batch 730 loss: 0.3841642141342163


Train, Epoch 7 / 20:  48%|████▊     | 744/1563 [00:21<00:22, 36.59it/s]

batch 740 loss: 0.3887003600597382


Train, Epoch 7 / 20:  48%|████▊     | 756/1563 [00:21<00:22, 36.30it/s]

batch 750 loss: 0.45843663811683655


Train, Epoch 7 / 20:  49%|████▉     | 764/1563 [00:21<00:22, 35.58it/s]

batch 760 loss: 0.3952721208333969


Train, Epoch 7 / 20:  50%|████▉     | 776/1563 [00:22<00:22, 35.57it/s]

batch 770 loss: 0.44860631227493286


Train, Epoch 7 / 20:  50%|█████     | 784/1563 [00:22<00:21, 35.61it/s]

batch 780 loss: 0.4232362613081932


Train, Epoch 7 / 20:  51%|█████     | 796/1563 [00:22<00:21, 36.00it/s]

batch 790 loss: 0.4996441096067429


Train, Epoch 7 / 20:  51%|█████▏    | 804/1563 [00:22<00:20, 36.18it/s]

batch 800 loss: 0.4638880342245102


Train, Epoch 7 / 20:  52%|█████▏    | 816/1563 [00:23<00:20, 36.20it/s]

batch 810 loss: 0.41975980848073957


Train, Epoch 7 / 20:  53%|█████▎    | 824/1563 [00:23<00:20, 36.25it/s]

batch 820 loss: 0.4119703933596611


Train, Epoch 7 / 20:  53%|█████▎    | 836/1563 [00:23<00:20, 35.95it/s]

batch 830 loss: 0.42846270799636843


Train, Epoch 7 / 20:  54%|█████▍    | 844/1563 [00:23<00:19, 36.14it/s]

batch 840 loss: 0.4147219479084015


Train, Epoch 7 / 20:  55%|█████▍    | 856/1563 [00:24<00:19, 36.24it/s]

batch 850 loss: 0.4456121727824211


Train, Epoch 7 / 20:  55%|█████▌    | 864/1563 [00:24<00:19, 36.15it/s]

batch 860 loss: 0.42956252098083497


Train, Epoch 7 / 20:  56%|█████▌    | 876/1563 [00:24<00:18, 36.38it/s]

batch 870 loss: 0.46397503912448884


Train, Epoch 7 / 20:  57%|█████▋    | 884/1563 [00:25<00:19, 35.60it/s]

batch 880 loss: 0.4541088089346886


Train, Epoch 7 / 20:  57%|█████▋    | 896/1563 [00:25<00:18, 35.90it/s]

batch 890 loss: 0.41683968156576157


Train, Epoch 7 / 20:  58%|█████▊    | 904/1563 [00:25<00:18, 35.66it/s]

batch 900 loss: 0.4017424449324608


Train, Epoch 7 / 20:  59%|█████▊    | 916/1563 [00:25<00:18, 35.76it/s]

batch 910 loss: 0.46204398572444916


Train, Epoch 7 / 20:  59%|█████▉    | 924/1563 [00:26<00:18, 35.17it/s]

batch 920 loss: 0.41068568229675295


Train, Epoch 7 / 20:  60%|█████▉    | 936/1563 [00:26<00:17, 35.56it/s]

batch 930 loss: 0.5055757403373718


Train, Epoch 7 / 20:  60%|██████    | 944/1563 [00:26<00:17, 35.76it/s]

batch 940 loss: 0.4693216621875763


Train, Epoch 7 / 20:  61%|██████    | 956/1563 [00:27<00:17, 35.52it/s]

batch 950 loss: 0.5062367647886277


Train, Epoch 7 / 20:  62%|██████▏   | 964/1563 [00:27<00:16, 35.95it/s]

batch 960 loss: 0.43235890865325927


Train, Epoch 7 / 20:  62%|██████▏   | 976/1563 [00:27<00:16, 35.49it/s]

batch 970 loss: 0.42350754141807556


Train, Epoch 7 / 20:  63%|██████▎   | 984/1563 [00:27<00:16, 35.23it/s]

batch 980 loss: 0.3942607581615448


Train, Epoch 7 / 20:  64%|██████▎   | 996/1563 [00:28<00:16, 33.81it/s]

batch 990 loss: 0.4838148862123489


Train, Epoch 7 / 20:  64%|██████▍   | 1004/1563 [00:28<00:16, 33.44it/s]

batch 1000 loss: 0.4377752959728241


Train, Epoch 7 / 20:  65%|██████▌   | 1016/1563 [00:28<00:16, 33.32it/s]

batch 1010 loss: 0.5318668633699417


Train, Epoch 7 / 20:  66%|██████▌   | 1024/1563 [00:29<00:16, 33.11it/s]

batch 1020 loss: 0.5001311987638474


Train, Epoch 7 / 20:  66%|██████▋   | 1036/1563 [00:29<00:15, 33.26it/s]

batch 1030 loss: 0.43658881187438964


Train, Epoch 7 / 20:  67%|██████▋   | 1044/1563 [00:29<00:15, 32.47it/s]

batch 1040 loss: 0.4380017459392548


Train, Epoch 7 / 20:  68%|██████▊   | 1056/1563 [00:30<00:15, 32.10it/s]

batch 1050 loss: 0.45096866488456727


Train, Epoch 7 / 20:  68%|██████▊   | 1064/1563 [00:30<00:15, 31.78it/s]

batch 1060 loss: 0.41960996091365815


Train, Epoch 7 / 20:  69%|██████▉   | 1076/1563 [00:30<00:14, 34.34it/s]

batch 1070 loss: 0.4385921359062195


Train, Epoch 7 / 20:  69%|██████▉   | 1084/1563 [00:30<00:13, 35.14it/s]

batch 1080 loss: 0.38575938642024993


Train, Epoch 7 / 20:  70%|███████   | 1096/1563 [00:31<00:12, 36.18it/s]

batch 1090 loss: 0.5125752866268158


Train, Epoch 7 / 20:  71%|███████   | 1104/1563 [00:31<00:12, 36.41it/s]

batch 1100 loss: 0.4095547154545784


Train, Epoch 7 / 20:  71%|███████▏  | 1116/1563 [00:31<00:12, 35.18it/s]

batch 1110 loss: 0.4680349573493004


Train, Epoch 7 / 20:  72%|███████▏  | 1124/1563 [00:32<00:12, 35.46it/s]

batch 1120 loss: 0.4263972669839859


Train, Epoch 7 / 20:  73%|███████▎  | 1136/1563 [00:32<00:11, 36.29it/s]

batch 1130 loss: 0.43433817923069


Train, Epoch 7 / 20:  73%|███████▎  | 1144/1563 [00:32<00:11, 36.39it/s]

batch 1140 loss: 0.34781535267829894


Train, Epoch 7 / 20:  74%|███████▍  | 1156/1563 [00:32<00:11, 36.32it/s]

batch 1150 loss: 0.41043716967105864


Train, Epoch 7 / 20:  74%|███████▍  | 1164/1563 [00:33<00:11, 35.92it/s]

batch 1160 loss: 0.5194370448589325


Train, Epoch 7 / 20:  75%|███████▌  | 1176/1563 [00:33<00:10, 36.34it/s]

batch 1170 loss: 0.3897865116596222


Train, Epoch 7 / 20:  76%|███████▌  | 1184/1563 [00:33<00:10, 36.38it/s]

batch 1180 loss: 0.4105395883321762


Train, Epoch 7 / 20:  77%|███████▋  | 1196/1563 [00:33<00:10, 36.27it/s]

batch 1190 loss: 0.40505382791161537


Train, Epoch 7 / 20:  77%|███████▋  | 1204/1563 [00:34<00:10, 35.83it/s]

batch 1200 loss: 0.5734567195177078


Train, Epoch 7 / 20:  78%|███████▊  | 1216/1563 [00:34<00:09, 35.89it/s]

batch 1210 loss: 0.497478124499321


Train, Epoch 7 / 20:  78%|███████▊  | 1224/1563 [00:34<00:09, 35.44it/s]

batch 1220 loss: 0.41010632961988447


Train, Epoch 7 / 20:  79%|███████▉  | 1236/1563 [00:35<00:09, 35.64it/s]

batch 1230 loss: 0.4096763700246811


Train, Epoch 7 / 20:  80%|███████▉  | 1244/1563 [00:35<00:08, 35.94it/s]

batch 1240 loss: 0.4890177994966507


Train, Epoch 7 / 20:  80%|████████  | 1256/1563 [00:35<00:08, 36.14it/s]

batch 1250 loss: 0.4808421164751053


Train, Epoch 7 / 20:  81%|████████  | 1264/1563 [00:35<00:08, 36.01it/s]

batch 1260 loss: 0.3913200110197067


Train, Epoch 7 / 20:  82%|████████▏ | 1276/1563 [00:36<00:08, 35.73it/s]

batch 1270 loss: 0.42255466282367704


Train, Epoch 7 / 20:  82%|████████▏ | 1284/1563 [00:36<00:07, 36.18it/s]

batch 1280 loss: 0.41216988265514376


Train, Epoch 7 / 20:  83%|████████▎ | 1296/1563 [00:36<00:07, 36.24it/s]

batch 1290 loss: 0.36508694142103193


Train, Epoch 7 / 20:  83%|████████▎ | 1304/1563 [00:37<00:07, 34.86it/s]

batch 1300 loss: 0.4328953713178635


Train, Epoch 7 / 20:  84%|████████▍ | 1316/1563 [00:37<00:06, 35.89it/s]

batch 1310 loss: 0.4459924191236496


Train, Epoch 7 / 20:  85%|████████▍ | 1324/1563 [00:37<00:06, 36.34it/s]

batch 1320 loss: 0.451658108830452


Train, Epoch 7 / 20:  85%|████████▌ | 1336/1563 [00:37<00:06, 35.97it/s]

batch 1330 loss: 0.44086330235004423


Train, Epoch 7 / 20:  86%|████████▌ | 1344/1563 [00:38<00:06, 36.18it/s]

batch 1340 loss: 0.4495925709605217


Train, Epoch 7 / 20:  87%|████████▋ | 1356/1563 [00:38<00:05, 36.43it/s]

batch 1350 loss: 0.42541883885860443


Train, Epoch 7 / 20:  87%|████████▋ | 1364/1563 [00:38<00:05, 36.33it/s]

batch 1360 loss: 0.47541753202676773


Train, Epoch 7 / 20:  88%|████████▊ | 1376/1563 [00:39<00:05, 35.66it/s]

batch 1370 loss: 0.3773661971092224


Train, Epoch 7 / 20:  89%|████████▊ | 1384/1563 [00:39<00:05, 35.63it/s]

batch 1380 loss: 0.45392039716243743


Train, Epoch 7 / 20:  89%|████████▉ | 1396/1563 [00:39<00:04, 36.08it/s]

batch 1390 loss: 0.4360851839184761


Train, Epoch 7 / 20:  90%|████████▉ | 1404/1563 [00:39<00:04, 36.08it/s]

batch 1400 loss: 0.397402460873127


Train, Epoch 7 / 20:  91%|█████████ | 1416/1563 [00:40<00:04, 35.96it/s]

batch 1410 loss: 0.46729802489280703


Train, Epoch 7 / 20:  91%|█████████ | 1424/1563 [00:40<00:03, 34.78it/s]

batch 1420 loss: 0.4770641177892685


Train, Epoch 7 / 20:  92%|█████████▏| 1436/1563 [00:40<00:03, 33.81it/s]

batch 1430 loss: 0.48388158828020095


Train, Epoch 7 / 20:  92%|█████████▏| 1444/1563 [00:40<00:03, 32.70it/s]

batch 1440 loss: 0.41791001111269


Train, Epoch 7 / 20:  93%|█████████▎| 1456/1563 [00:41<00:03, 33.19it/s]

batch 1450 loss: 0.49569460153579714


Train, Epoch 7 / 20:  94%|█████████▎| 1464/1563 [00:41<00:02, 33.79it/s]

batch 1460 loss: 0.4699617773294449


Train, Epoch 7 / 20:  94%|█████████▍| 1476/1563 [00:41<00:02, 33.82it/s]

batch 1470 loss: 0.415128493309021


Train, Epoch 7 / 20:  95%|█████████▍| 1484/1563 [00:42<00:02, 32.38it/s]

batch 1480 loss: 0.3918068140745163


Train, Epoch 7 / 20:  96%|█████████▌| 1496/1563 [00:42<00:02, 31.72it/s]

batch 1490 loss: 0.3872589945793152


Train, Epoch 7 / 20:  96%|█████████▌| 1504/1563 [00:42<00:01, 32.10it/s]

batch 1500 loss: 0.4067309573292732


Train, Epoch 7 / 20:  97%|█████████▋| 1516/1563 [00:43<00:01, 34.46it/s]

batch 1510 loss: 0.5072085410356522


Train, Epoch 7 / 20:  98%|█████████▊| 1524/1563 [00:43<00:01, 35.42it/s]

batch 1520 loss: 0.46711298525333406


Train, Epoch 7 / 20:  98%|█████████▊| 1536/1563 [00:43<00:00, 35.29it/s]

batch 1530 loss: 0.3557981997728348


Train, Epoch 7 / 20:  99%|█████████▉| 1544/1563 [00:43<00:00, 35.64it/s]

batch 1540 loss: 0.45239644348621366


Train, Epoch 7 / 20: 100%|█████████▉| 1556/1563 [00:44<00:00, 36.15it/s]

batch 1550 loss: 0.43958962261676787


Train, Epoch 7 / 20: 100%|██████████| 1563/1563 [00:44<00:00, 35.17it/s]


batch 1560 loss: 0.450365449488163


Test, Epoch 7 / 20: 100%|██████████| 1563/1563 [00:20<00:00, 74.74it/s]


Epoch 7, loss: 0.47672319305062294, accuracy: 0.7758


Train, Epoch 8 / 20:   1%|          | 14/1563 [00:00<00:51, 30.18it/s]

batch 10 loss: 0.49058442413806913


Train, Epoch 8 / 20:   2%|▏         | 26/1563 [00:00<00:47, 32.36it/s]

batch 20 loss: 0.3915045663714409


Train, Epoch 8 / 20:   2%|▏         | 34/1563 [00:01<00:46, 32.81it/s]

batch 30 loss: 0.40598586201667786


Train, Epoch 8 / 20:   3%|▎         | 46/1563 [00:01<00:47, 31.64it/s]

batch 40 loss: 0.37112053036689757


Train, Epoch 8 / 20:   3%|▎         | 54/1563 [00:01<00:48, 30.88it/s]

batch 50 loss: 0.3673349440097809


Train, Epoch 8 / 20:   4%|▍         | 66/1563 [00:02<00:47, 31.35it/s]

batch 60 loss: 0.5200482547283173


Train, Epoch 8 / 20:   5%|▍         | 74/1563 [00:02<00:44, 33.21it/s]

batch 70 loss: 0.4345860332250595


Train, Epoch 8 / 20:   6%|▌         | 86/1563 [00:02<00:42, 34.87it/s]

batch 80 loss: 0.3797908887267113


Train, Epoch 8 / 20:   6%|▌         | 94/1563 [00:02<00:41, 35.51it/s]

batch 90 loss: 0.44710035920143126


Train, Epoch 8 / 20:   7%|▋         | 106/1563 [00:03<00:40, 36.01it/s]

batch 100 loss: 0.47464428544044496


Train, Epoch 8 / 20:   7%|▋         | 114/1563 [00:03<00:40, 35.95it/s]

batch 110 loss: 0.4265448898077011


Train, Epoch 8 / 20:   8%|▊         | 126/1563 [00:03<00:39, 36.19it/s]

batch 120 loss: 0.3963504761457443


Train, Epoch 8 / 20:   9%|▊         | 134/1563 [00:03<00:39, 36.48it/s]

batch 130 loss: 0.37252663671970365


Train, Epoch 8 / 20:   9%|▉         | 146/1563 [00:04<00:38, 36.86it/s]

batch 140 loss: 0.4469251349568367


Train, Epoch 8 / 20:  10%|▉         | 154/1563 [00:04<00:38, 36.32it/s]

batch 150 loss: 0.49682279825210574


Train, Epoch 8 / 20:  11%|█         | 166/1563 [00:04<00:38, 36.43it/s]

batch 160 loss: 0.48456842005252837


Train, Epoch 8 / 20:  11%|█         | 174/1563 [00:05<00:38, 36.53it/s]

batch 170 loss: 0.43379113674163816


Train, Epoch 8 / 20:  12%|█▏        | 186/1563 [00:05<00:38, 35.77it/s]

batch 180 loss: 0.4143182411789894


Train, Epoch 8 / 20:  12%|█▏        | 194/1563 [00:05<00:39, 35.10it/s]

batch 190 loss: 0.37472691386938095


Train, Epoch 8 / 20:  13%|█▎        | 206/1563 [00:05<00:37, 35.82it/s]

batch 200 loss: 0.42414902448654174


Train, Epoch 8 / 20:  14%|█▎        | 214/1563 [00:06<00:37, 35.57it/s]

batch 210 loss: 0.4397899344563484


Train, Epoch 8 / 20:  14%|█▍        | 226/1563 [00:06<00:37, 35.30it/s]

batch 220 loss: 0.459284146130085


Train, Epoch 8 / 20:  15%|█▍        | 234/1563 [00:06<00:37, 35.52it/s]

batch 230 loss: 0.39687644839286806


Train, Epoch 8 / 20:  16%|█▌        | 246/1563 [00:07<00:36, 36.46it/s]

batch 240 loss: 0.4611853152513504


Train, Epoch 8 / 20:  16%|█▋        | 254/1563 [00:07<00:35, 36.51it/s]

batch 250 loss: 0.38989545702934264


Train, Epoch 8 / 20:  17%|█▋        | 266/1563 [00:07<00:35, 36.11it/s]

batch 260 loss: 0.36952303946018217


Train, Epoch 8 / 20:  18%|█▊        | 274/1563 [00:07<00:35, 36.04it/s]

batch 270 loss: 0.39733865559101106


Train, Epoch 8 / 20:  18%|█▊        | 286/1563 [00:08<00:35, 35.85it/s]

batch 280 loss: 0.43325567096471784


Train, Epoch 8 / 20:  19%|█▉        | 294/1563 [00:08<00:35, 35.38it/s]

batch 290 loss: 0.39825309664011


Train, Epoch 8 / 20:  20%|█▉        | 306/1563 [00:08<00:34, 36.07it/s]

batch 300 loss: 0.40412930250167844


Train, Epoch 8 / 20:  20%|██        | 314/1563 [00:08<00:34, 36.00it/s]

batch 310 loss: 0.39874793738126757


Train, Epoch 8 / 20:  21%|██        | 326/1563 [00:09<00:33, 36.58it/s]

batch 320 loss: 0.3903160780668259


Train, Epoch 8 / 20:  21%|██▏       | 334/1563 [00:09<00:34, 35.96it/s]

batch 330 loss: 0.46325705051422117


Train, Epoch 8 / 20:  22%|██▏       | 346/1563 [00:09<00:33, 35.90it/s]

batch 340 loss: 0.530150355398655


Train, Epoch 8 / 20:  23%|██▎       | 354/1563 [00:10<00:33, 36.24it/s]

batch 350 loss: 0.37663752734661105


Train, Epoch 8 / 20:  23%|██▎       | 366/1563 [00:10<00:32, 36.35it/s]

batch 360 loss: 0.42227422147989274


Train, Epoch 8 / 20:  24%|██▍       | 374/1563 [00:10<00:33, 35.95it/s]

batch 370 loss: 0.5437171638011933


Train, Epoch 8 / 20:  25%|██▍       | 386/1563 [00:10<00:32, 36.64it/s]

batch 380 loss: 0.49669755101203916


Train, Epoch 8 / 20:  25%|██▌       | 394/1563 [00:11<00:32, 36.36it/s]

batch 390 loss: 0.3539540499448776


Train, Epoch 8 / 20:  26%|██▌       | 406/1563 [00:11<00:32, 35.95it/s]

batch 400 loss: 0.4299720495939255


Train, Epoch 8 / 20:  26%|██▋       | 414/1563 [00:11<00:31, 36.22it/s]

batch 410 loss: 0.430551615357399


Train, Epoch 8 / 20:  27%|██▋       | 426/1563 [00:12<00:31, 36.13it/s]

batch 420 loss: 0.3980768144130707


Train, Epoch 8 / 20:  28%|██▊       | 434/1563 [00:12<00:32, 34.51it/s]

batch 430 loss: 0.41333392411470415


Train, Epoch 8 / 20:  29%|██▊       | 446/1563 [00:12<00:33, 33.30it/s]

batch 440 loss: 0.3949970409274101


Train, Epoch 8 / 20:  29%|██▉       | 454/1563 [00:12<00:32, 33.87it/s]

batch 450 loss: 0.5293397724628448


Train, Epoch 8 / 20:  30%|██▉       | 466/1563 [00:13<00:32, 34.21it/s]

batch 460 loss: 0.41182024478912355


Train, Epoch 8 / 20:  30%|███       | 474/1563 [00:13<00:31, 34.42it/s]

batch 470 loss: 0.4931201934814453


Train, Epoch 8 / 20:  31%|███       | 486/1563 [00:13<00:32, 32.99it/s]

batch 480 loss: 0.4575587272644043


Train, Epoch 8 / 20:  32%|███▏      | 494/1563 [00:14<00:33, 31.93it/s]

batch 490 loss: 0.4875788360834122


Train, Epoch 8 / 20:  32%|███▏      | 506/1563 [00:14<00:32, 32.47it/s]

batch 500 loss: 0.47592596560716627


Train, Epoch 8 / 20:  33%|███▎      | 514/1563 [00:14<00:30, 34.24it/s]

batch 510 loss: 0.41893802732229235


Train, Epoch 8 / 20:  34%|███▎      | 526/1563 [00:15<00:29, 35.66it/s]

batch 520 loss: 0.48658759593963624


Train, Epoch 8 / 20:  34%|███▍      | 534/1563 [00:15<00:28, 35.93it/s]

batch 530 loss: 0.4671818375587463


Train, Epoch 8 / 20:  35%|███▍      | 546/1563 [00:15<00:27, 36.43it/s]

batch 540 loss: 0.3797658339142799


Train, Epoch 8 / 20:  35%|███▌      | 554/1563 [00:15<00:27, 36.27it/s]

batch 550 loss: 0.5182576090097427


Train, Epoch 8 / 20:  36%|███▌      | 566/1563 [00:16<00:27, 36.57it/s]

batch 560 loss: 0.4404448628425598


Train, Epoch 8 / 20:  37%|███▋      | 574/1563 [00:16<00:27, 36.56it/s]

batch 570 loss: 0.347712841629982


Train, Epoch 8 / 20:  37%|███▋      | 586/1563 [00:16<00:27, 35.72it/s]

batch 580 loss: 0.43719556331634524


Train, Epoch 8 / 20:  38%|███▊      | 594/1563 [00:16<00:27, 35.50it/s]

batch 590 loss: 0.42496027052402496


Train, Epoch 8 / 20:  39%|███▉      | 606/1563 [00:17<00:26, 36.12it/s]

batch 600 loss: 0.40921297669410706


Train, Epoch 8 / 20:  39%|███▉      | 614/1563 [00:17<00:26, 36.03it/s]

batch 610 loss: 0.5299478739500045


Train, Epoch 8 / 20:  40%|████      | 626/1563 [00:17<00:25, 36.36it/s]

batch 620 loss: 0.45502068996429446


Train, Epoch 8 / 20:  41%|████      | 634/1563 [00:18<00:25, 36.13it/s]

batch 630 loss: 0.44725360721349716


Train, Epoch 8 / 20:  41%|████▏     | 646/1563 [00:18<00:25, 36.10it/s]

batch 640 loss: 0.39770951569080354


Train, Epoch 8 / 20:  42%|████▏     | 654/1563 [00:18<00:25, 35.97it/s]

batch 650 loss: 0.4430883154273033


Train, Epoch 8 / 20:  43%|████▎     | 666/1563 [00:18<00:24, 35.89it/s]

batch 660 loss: 0.40506705492734907


Train, Epoch 8 / 20:  43%|████▎     | 674/1563 [00:19<00:24, 35.74it/s]

batch 670 loss: 0.36586771756410597


Train, Epoch 8 / 20:  44%|████▍     | 686/1563 [00:19<00:24, 35.50it/s]

batch 680 loss: 0.4490497976541519


Train, Epoch 8 / 20:  44%|████▍     | 694/1563 [00:19<00:24, 34.95it/s]

batch 690 loss: 0.46668879091739657


Train, Epoch 8 / 20:  45%|████▌     | 706/1563 [00:20<00:24, 35.31it/s]

batch 700 loss: 0.39978763163089753


Train, Epoch 8 / 20:  46%|████▌     | 714/1563 [00:20<00:24, 35.10it/s]

batch 710 loss: 0.42363650649785994


Train, Epoch 8 / 20:  46%|████▋     | 726/1563 [00:20<00:23, 36.09it/s]

batch 720 loss: 0.48990866243839265


Train, Epoch 8 / 20:  47%|████▋     | 734/1563 [00:20<00:23, 35.74it/s]

batch 730 loss: 0.4438962072134018


Train, Epoch 8 / 20:  48%|████▊     | 746/1563 [00:21<00:22, 35.75it/s]

batch 740 loss: 0.4924243092536926


Train, Epoch 8 / 20:  48%|████▊     | 754/1563 [00:21<00:22, 35.97it/s]

batch 750 loss: 0.33629609644412994


Train, Epoch 8 / 20:  49%|████▉     | 766/1563 [00:21<00:21, 36.37it/s]

batch 760 loss: 0.4150582402944565


Train, Epoch 8 / 20:  50%|████▉     | 774/1563 [00:21<00:22, 35.23it/s]

batch 770 loss: 0.3829272985458374


Train, Epoch 8 / 20:  50%|█████     | 786/1563 [00:22<00:21, 35.61it/s]

batch 780 loss: 0.3703346014022827


Train, Epoch 8 / 20:  51%|█████     | 794/1563 [00:22<00:21, 35.54it/s]

batch 790 loss: 0.48843670189380645


Train, Epoch 8 / 20:  52%|█████▏    | 806/1563 [00:22<00:20, 36.35it/s]

batch 800 loss: 0.43462308049201964


Train, Epoch 8 / 20:  52%|█████▏    | 814/1563 [00:23<00:21, 35.63it/s]

batch 810 loss: 0.42851869463920594


Train, Epoch 8 / 20:  53%|█████▎    | 826/1563 [00:23<00:20, 36.20it/s]

batch 820 loss: 0.3533143073320389


Train, Epoch 8 / 20:  53%|█████▎    | 834/1563 [00:23<00:20, 36.12it/s]

batch 830 loss: 0.3701161578297615


Train, Epoch 8 / 20:  54%|█████▍    | 846/1563 [00:23<00:19, 35.92it/s]

batch 840 loss: 0.3843123376369476


Train, Epoch 8 / 20:  55%|█████▍    | 854/1563 [00:24<00:19, 35.82it/s]

batch 850 loss: 0.39614138603210447


Train, Epoch 8 / 20:  55%|█████▌    | 866/1563 [00:24<00:19, 35.11it/s]

batch 860 loss: 0.40743096470832824


Train, Epoch 8 / 20:  56%|█████▌    | 874/1563 [00:24<00:20, 33.06it/s]

batch 870 loss: 0.42528402507305146


Train, Epoch 8 / 20:  57%|█████▋    | 886/1563 [00:25<00:20, 33.54it/s]

batch 880 loss: 0.452050606906414


Train, Epoch 8 / 20:  57%|█████▋    | 894/1563 [00:25<00:19, 33.54it/s]

batch 890 loss: 0.40710812658071516


Train, Epoch 8 / 20:  58%|█████▊    | 906/1563 [00:25<00:19, 33.22it/s]

batch 900 loss: 0.44025231897830963


Train, Epoch 8 / 20:  58%|█████▊    | 914/1563 [00:26<00:20, 32.14it/s]

batch 910 loss: 0.371035273373127


Train, Epoch 8 / 20:  59%|█████▉    | 926/1563 [00:26<00:20, 31.75it/s]

batch 920 loss: 0.4001548424363136


Train, Epoch 8 / 20:  60%|█████▉    | 934/1563 [00:26<00:20, 31.15it/s]

batch 930 loss: 0.4077275186777115


Train, Epoch 8 / 20:  61%|██████    | 946/1563 [00:27<00:18, 33.46it/s]

batch 940 loss: 0.41428112983703613


Train, Epoch 8 / 20:  61%|██████    | 954/1563 [00:27<00:17, 34.57it/s]

batch 950 loss: 0.4429601848125458


Train, Epoch 8 / 20:  62%|██████▏   | 966/1563 [00:27<00:16, 35.59it/s]

batch 960 loss: 0.39371359795331956


Train, Epoch 8 / 20:  62%|██████▏   | 974/1563 [00:27<00:16, 36.04it/s]

batch 970 loss: 0.5103298500180244


Train, Epoch 8 / 20:  63%|██████▎   | 986/1563 [00:28<00:16, 36.05it/s]

batch 980 loss: 0.4005001574754715


Train, Epoch 8 / 20:  64%|██████▎   | 994/1563 [00:28<00:15, 36.32it/s]

batch 990 loss: 0.4557314723730087


Train, Epoch 8 / 20:  64%|██████▍   | 1006/1563 [00:28<00:15, 36.08it/s]

batch 1000 loss: 0.3933513730764389


Train, Epoch 8 / 20:  65%|██████▍   | 1014/1563 [00:28<00:15, 35.73it/s]

batch 1010 loss: 0.44451640248298646


Train, Epoch 8 / 20:  66%|██████▌   | 1026/1563 [00:29<00:14, 35.97it/s]

batch 1020 loss: 0.49674331247806547


Train, Epoch 8 / 20:  66%|██████▌   | 1034/1563 [00:29<00:14, 35.53it/s]

batch 1030 loss: 0.49100445210933685


Train, Epoch 8 / 20:  67%|██████▋   | 1046/1563 [00:29<00:14, 36.28it/s]

batch 1040 loss: 0.42190811038017273


Train, Epoch 8 / 20:  67%|██████▋   | 1054/1563 [00:30<00:14, 35.80it/s]

batch 1050 loss: 0.3900384396314621


Train, Epoch 8 / 20:  68%|██████▊   | 1066/1563 [00:30<00:14, 35.26it/s]

batch 1060 loss: 0.4544931411743164


Train, Epoch 8 / 20:  69%|██████▊   | 1074/1563 [00:30<00:13, 35.79it/s]

batch 1070 loss: 0.46696137487888334


Train, Epoch 8 / 20:  69%|██████▉   | 1086/1563 [00:30<00:13, 36.35it/s]

batch 1080 loss: 0.39673291742801664


Train, Epoch 8 / 20:  70%|██████▉   | 1094/1563 [00:31<00:13, 35.78it/s]

batch 1090 loss: 0.41980779618024827


Train, Epoch 8 / 20:  71%|███████   | 1106/1563 [00:31<00:12, 35.81it/s]

batch 1100 loss: 0.40796066969633105


Train, Epoch 8 / 20:  71%|███████▏  | 1114/1563 [00:31<00:12, 36.15it/s]

batch 1110 loss: 0.43467129915952685


Train, Epoch 8 / 20:  72%|███████▏  | 1126/1563 [00:32<00:12, 36.29it/s]

batch 1120 loss: 0.2863749533891678


Train, Epoch 8 / 20:  73%|███████▎  | 1134/1563 [00:32<00:11, 36.10it/s]

batch 1130 loss: 0.42528495788574217


Train, Epoch 8 / 20:  73%|███████▎  | 1146/1563 [00:32<00:11, 36.33it/s]

batch 1140 loss: 0.39440860152244567


Train, Epoch 8 / 20:  74%|███████▍  | 1154/1563 [00:32<00:11, 36.59it/s]

batch 1150 loss: 0.3797087222337723


Train, Epoch 8 / 20:  75%|███████▍  | 1166/1563 [00:33<00:10, 36.48it/s]

batch 1160 loss: 0.4780413657426834


Train, Epoch 8 / 20:  75%|███████▌  | 1174/1563 [00:33<00:10, 36.32it/s]

batch 1170 loss: 0.3775437340140343


Train, Epoch 8 / 20:  76%|███████▌  | 1186/1563 [00:33<00:10, 36.56it/s]

batch 1180 loss: 0.5245864734053611


Train, Epoch 8 / 20:  76%|███████▋  | 1194/1563 [00:33<00:10, 36.41it/s]

batch 1190 loss: 0.46724121272563934


Train, Epoch 8 / 20:  77%|███████▋  | 1206/1563 [00:34<00:09, 36.53it/s]

batch 1200 loss: 0.4609989643096924


Train, Epoch 8 / 20:  78%|███████▊  | 1214/1563 [00:34<00:09, 36.37it/s]

batch 1210 loss: 0.43115378767251966


Train, Epoch 8 / 20:  78%|███████▊  | 1226/1563 [00:34<00:09, 36.23it/s]

batch 1220 loss: 0.39736652970314024


Train, Epoch 8 / 20:  79%|███████▉  | 1234/1563 [00:35<00:09, 36.22it/s]

batch 1230 loss: 0.4288462162017822


Train, Epoch 8 / 20:  80%|███████▉  | 1246/1563 [00:35<00:08, 36.43it/s]

batch 1240 loss: 0.41587600857019424


Train, Epoch 8 / 20:  80%|████████  | 1254/1563 [00:35<00:08, 36.27it/s]

batch 1250 loss: 0.42197037786245345


Train, Epoch 8 / 20:  81%|████████  | 1266/1563 [00:35<00:08, 36.50it/s]

batch 1260 loss: 0.36416790783405306


Train, Epoch 8 / 20:  82%|████████▏ | 1274/1563 [00:36<00:07, 36.24it/s]

batch 1270 loss: 0.3510786071419716


Train, Epoch 8 / 20:  82%|████████▏ | 1286/1563 [00:36<00:07, 36.14it/s]

batch 1280 loss: 0.373132036626339


Train, Epoch 8 / 20:  83%|████████▎ | 1294/1563 [00:36<00:07, 36.45it/s]

batch 1290 loss: 0.3880839332938194


Train, Epoch 8 / 20:  84%|████████▎ | 1306/1563 [00:37<00:07, 34.27it/s]

batch 1300 loss: 0.45230919122695923


Train, Epoch 8 / 20:  84%|████████▍ | 1314/1563 [00:37<00:07, 33.63it/s]

batch 1310 loss: 0.5021959871053696


Train, Epoch 8 / 20:  85%|████████▍ | 1326/1563 [00:37<00:07, 32.76it/s]

batch 1320 loss: 0.43308454751968384


Train, Epoch 8 / 20:  85%|████████▌ | 1334/1563 [00:37<00:07, 32.34it/s]

batch 1330 loss: 0.36133653968572615


Train, Epoch 8 / 20:  86%|████████▌ | 1346/1563 [00:38<00:06, 32.39it/s]

batch 1340 loss: 0.4133791372179985


Train, Epoch 8 / 20:  87%|████████▋ | 1354/1563 [00:38<00:06, 31.88it/s]

batch 1350 loss: 0.34703140407800676


Train, Epoch 8 / 20:  87%|████████▋ | 1366/1563 [00:38<00:06, 31.25it/s]

batch 1360 loss: 0.5033133447170257


Train, Epoch 8 / 20:  88%|████████▊ | 1374/1563 [00:39<00:05, 31.53it/s]

batch 1370 loss: 0.412362764775753


Train, Epoch 8 / 20:  89%|████████▊ | 1386/1563 [00:39<00:05, 33.97it/s]

batch 1380 loss: 0.38165476024150846


Train, Epoch 8 / 20:  89%|████████▉ | 1394/1563 [00:39<00:04, 35.05it/s]

batch 1390 loss: 0.41044190526008606


Train, Epoch 8 / 20:  90%|████████▉ | 1406/1563 [00:40<00:04, 35.61it/s]

batch 1400 loss: 0.49829064309597015


Train, Epoch 8 / 20:  90%|█████████ | 1414/1563 [00:40<00:04, 35.23it/s]

batch 1410 loss: 0.42457708418369294


Train, Epoch 8 / 20:  91%|█████████ | 1426/1563 [00:40<00:03, 35.08it/s]

batch 1420 loss: 0.3429900974035263


Train, Epoch 8 / 20:  92%|█████████▏| 1434/1563 [00:40<00:03, 35.34it/s]

batch 1430 loss: 0.4426513373851776


Train, Epoch 8 / 20:  93%|█████████▎| 1446/1563 [00:41<00:03, 35.85it/s]

batch 1440 loss: 0.35427444726228713


Train, Epoch 8 / 20:  93%|█████████▎| 1454/1563 [00:41<00:03, 36.07it/s]

batch 1450 loss: 0.45314319580793383


Train, Epoch 8 / 20:  94%|█████████▍| 1466/1563 [00:41<00:02, 35.73it/s]

batch 1460 loss: 0.4509172007441521


Train, Epoch 8 / 20:  94%|█████████▍| 1474/1563 [00:41<00:02, 35.89it/s]

batch 1470 loss: 0.3934380903840065


Train, Epoch 8 / 20:  95%|█████████▌| 1486/1563 [00:42<00:02, 36.21it/s]

batch 1480 loss: 0.40783066153526304


Train, Epoch 8 / 20:  96%|█████████▌| 1494/1563 [00:42<00:01, 35.68it/s]

batch 1490 loss: 0.41854177713394164


Train, Epoch 8 / 20:  96%|█████████▋| 1506/1563 [00:42<00:01, 36.15it/s]

batch 1500 loss: 0.5193633198738098


Train, Epoch 8 / 20:  97%|█████████▋| 1514/1563 [00:43<00:01, 36.19it/s]

batch 1510 loss: 0.4643618941307068


Train, Epoch 8 / 20:  98%|█████████▊| 1526/1563 [00:43<00:01, 35.95it/s]

batch 1520 loss: 0.4818991720676422


Train, Epoch 8 / 20:  98%|█████████▊| 1534/1563 [00:43<00:00, 35.42it/s]

batch 1530 loss: 0.5253838539123535


Train, Epoch 8 / 20:  99%|█████████▉| 1546/1563 [00:43<00:00, 36.24it/s]

batch 1540 loss: 0.430621463060379


Train, Epoch 8 / 20:  99%|█████████▉| 1554/1563 [00:44<00:00, 36.51it/s]

batch 1550 loss: 0.4178227409720421


Train, Epoch 8 / 20: 100%|██████████| 1563/1563 [00:44<00:00, 35.17it/s]


batch 1560 loss: 0.4061270445585251


Test, Epoch 8 / 20: 100%|██████████| 1563/1563 [00:21<00:00, 73.11it/s]


Epoch 8, loss: 0.4591383590579033, accuracy: 0.78328


Train, Epoch 9 / 20:   1%|          | 16/1563 [00:00<00:42, 36.52it/s]

batch 10 loss: 0.4697138279676437


Train, Epoch 9 / 20:   2%|▏         | 24/1563 [00:00<00:42, 36.22it/s]

batch 20 loss: 0.38618978708982465


Train, Epoch 9 / 20:   2%|▏         | 36/1563 [00:00<00:42, 36.35it/s]

batch 30 loss: 0.45173889994621275


Train, Epoch 9 / 20:   3%|▎         | 44/1563 [00:01<00:41, 36.47it/s]

batch 40 loss: 0.42481568455696106


Train, Epoch 9 / 20:   4%|▎         | 56/1563 [00:01<00:41, 36.30it/s]

batch 50 loss: 0.4830700755119324


Train, Epoch 9 / 20:   4%|▍         | 64/1563 [00:01<00:41, 35.79it/s]

batch 60 loss: 0.4016789197921753


Train, Epoch 9 / 20:   5%|▍         | 76/1563 [00:02<00:41, 35.56it/s]

batch 70 loss: 0.4319345384836197


Train, Epoch 9 / 20:   5%|▌         | 84/1563 [00:02<00:41, 35.37it/s]

batch 80 loss: 0.39667406380176545


Train, Epoch 9 / 20:   6%|▌         | 96/1563 [00:02<00:41, 35.52it/s]

batch 90 loss: 0.40933877527713775


Train, Epoch 9 / 20:   7%|▋         | 104/1563 [00:02<00:40, 35.77it/s]

batch 100 loss: 0.39933792501688004


Train, Epoch 9 / 20:   7%|▋         | 116/1563 [00:03<00:39, 36.26it/s]

batch 110 loss: 0.41360044330358503


Train, Epoch 9 / 20:   8%|▊         | 124/1563 [00:03<00:39, 36.40it/s]

batch 120 loss: 0.4989406555891037


Train, Epoch 9 / 20:   9%|▊         | 136/1563 [00:03<00:39, 35.88it/s]

batch 130 loss: 0.39693451672792435


Train, Epoch 9 / 20:   9%|▉         | 144/1563 [00:04<00:39, 35.96it/s]

batch 140 loss: 0.38938909620046613


Train, Epoch 9 / 20:  10%|▉         | 156/1563 [00:04<00:38, 36.11it/s]

batch 150 loss: 0.41459123492240907


Train, Epoch 9 / 20:  10%|█         | 164/1563 [00:04<00:38, 36.09it/s]

batch 160 loss: 0.46277900636196134


Train, Epoch 9 / 20:  11%|█▏        | 176/1563 [00:04<00:39, 35.43it/s]

batch 170 loss: 0.43297099471092226


Train, Epoch 9 / 20:  12%|█▏        | 184/1563 [00:05<00:38, 35.96it/s]

batch 180 loss: 0.47671372890472413


Train, Epoch 9 / 20:  13%|█▎        | 196/1563 [00:05<00:37, 36.05it/s]

batch 190 loss: 0.41567629426717756


Train, Epoch 9 / 20:  13%|█▎        | 204/1563 [00:05<00:38, 35.36it/s]

batch 200 loss: 0.42985943257808684


Train, Epoch 9 / 20:  14%|█▍        | 216/1563 [00:06<00:37, 36.02it/s]

batch 210 loss: 0.3833445906639099


Train, Epoch 9 / 20:  14%|█▍        | 224/1563 [00:06<00:36, 36.23it/s]

batch 220 loss: 0.49200237095355986


Train, Epoch 9 / 20:  15%|█▌        | 236/1563 [00:06<00:36, 36.48it/s]

batch 230 loss: 0.485490545630455


Train, Epoch 9 / 20:  16%|█▌        | 244/1563 [00:06<00:36, 36.18it/s]

batch 240 loss: 0.4037309318780899


Train, Epoch 9 / 20:  16%|█▋        | 256/1563 [00:07<00:36, 36.22it/s]

batch 250 loss: 0.4368327438831329


Train, Epoch 9 / 20:  17%|█▋        | 264/1563 [00:07<00:36, 35.68it/s]

batch 260 loss: 0.4797631323337555


Train, Epoch 9 / 20:  18%|█▊        | 276/1563 [00:07<00:35, 36.02it/s]

batch 270 loss: 0.4136024683713913


Train, Epoch 9 / 20:  18%|█▊        | 284/1563 [00:07<00:35, 35.63it/s]

batch 280 loss: 0.3970037415623665


Train, Epoch 9 / 20:  19%|█▉        | 296/1563 [00:08<00:38, 33.06it/s]

batch 290 loss: 0.440891170501709


Train, Epoch 9 / 20:  19%|█▉        | 304/1563 [00:08<00:38, 32.67it/s]

batch 300 loss: 0.37851705253124235


Train, Epoch 9 / 20:  20%|██        | 316/1563 [00:08<00:39, 31.81it/s]

batch 310 loss: 0.3828654855489731


Train, Epoch 9 / 20:  21%|██        | 324/1563 [00:09<00:39, 31.53it/s]

batch 320 loss: 0.39572473168373107


Train, Epoch 9 / 20:  21%|██▏       | 336/1563 [00:09<00:39, 30.78it/s]

batch 330 loss: 0.3939122840762138


Train, Epoch 9 / 20:  22%|██▏       | 344/1563 [00:09<00:39, 30.86it/s]

batch 340 loss: 0.40765847712755204


Train, Epoch 9 / 20:  23%|██▎       | 356/1563 [00:10<00:38, 31.23it/s]

batch 350 loss: 0.2983736217021942


Train, Epoch 9 / 20:  23%|██▎       | 364/1563 [00:10<00:36, 33.21it/s]

batch 360 loss: 0.36444805562496185


Train, Epoch 9 / 20:  24%|██▍       | 376/1563 [00:10<00:34, 34.56it/s]

batch 370 loss: 0.4234745651483536


Train, Epoch 9 / 20:  25%|██▍       | 384/1563 [00:10<00:33, 35.25it/s]

batch 380 loss: 0.43205671608448026


Train, Epoch 9 / 20:  25%|██▌       | 396/1563 [00:11<00:32, 36.19it/s]

batch 390 loss: 0.4710182249546051


Train, Epoch 9 / 20:  26%|██▌       | 404/1563 [00:11<00:31, 36.28it/s]

batch 400 loss: 0.3945339202880859


Train, Epoch 9 / 20:  27%|██▋       | 416/1563 [00:11<00:31, 36.52it/s]

batch 410 loss: 0.3755375973880291


Train, Epoch 9 / 20:  27%|██▋       | 424/1563 [00:12<00:31, 35.95it/s]

batch 420 loss: 0.4137792021036148


Train, Epoch 9 / 20:  28%|██▊       | 436/1563 [00:12<00:31, 36.04it/s]

batch 430 loss: 0.45684674084186555


Train, Epoch 9 / 20:  28%|██▊       | 444/1563 [00:12<00:30, 36.57it/s]

batch 440 loss: 0.4223658576607704


Train, Epoch 9 / 20:  29%|██▉       | 456/1563 [00:12<00:30, 36.11it/s]

batch 450 loss: 0.4338253945112228


Train, Epoch 9 / 20:  30%|██▉       | 464/1563 [00:13<00:30, 36.22it/s]

batch 460 loss: 0.43829022347927094


Train, Epoch 9 / 20:  30%|███       | 476/1563 [00:13<00:29, 36.95it/s]

batch 470 loss: 0.418265825510025


Train, Epoch 9 / 20:  31%|███       | 484/1563 [00:13<00:29, 36.92it/s]

batch 480 loss: 0.34942452758550646


Train, Epoch 9 / 20:  32%|███▏      | 496/1563 [00:14<00:29, 36.22it/s]

batch 490 loss: 0.4612451076507568


Train, Epoch 9 / 20:  32%|███▏      | 504/1563 [00:14<00:29, 35.74it/s]

batch 500 loss: 0.46468021124601366


Train, Epoch 9 / 20:  33%|███▎      | 516/1563 [00:14<00:28, 36.45it/s]

batch 510 loss: 0.319768650829792


Train, Epoch 9 / 20:  34%|███▎      | 524/1563 [00:14<00:28, 36.50it/s]

batch 520 loss: 0.4973640084266663


Train, Epoch 9 / 20:  34%|███▍      | 536/1563 [00:15<00:28, 36.37it/s]

batch 530 loss: 0.4043690860271454


Train, Epoch 9 / 20:  35%|███▍      | 544/1563 [00:15<00:28, 36.16it/s]

batch 540 loss: 0.376951864361763


Train, Epoch 9 / 20:  36%|███▌      | 556/1563 [00:15<00:27, 36.56it/s]

batch 550 loss: 0.3390360578894615


Train, Epoch 9 / 20:  36%|███▌      | 564/1563 [00:15<00:27, 36.03it/s]

batch 560 loss: 0.4196973413228989


Train, Epoch 9 / 20:  37%|███▋      | 576/1563 [00:16<00:27, 35.65it/s]

batch 570 loss: 0.4978926420211792


Train, Epoch 9 / 20:  37%|███▋      | 584/1563 [00:16<00:27, 35.96it/s]

batch 580 loss: 0.37889719307422637


Train, Epoch 9 / 20:  38%|███▊      | 596/1563 [00:16<00:26, 36.19it/s]

batch 590 loss: 0.3811770066618919


Train, Epoch 9 / 20:  39%|███▊      | 604/1563 [00:17<00:26, 36.26it/s]

batch 600 loss: 0.42731029987335206


Train, Epoch 9 / 20:  39%|███▉      | 616/1563 [00:17<00:26, 36.23it/s]

batch 610 loss: 0.4964064538478851


Train, Epoch 9 / 20:  40%|███▉      | 624/1563 [00:17<00:25, 36.26it/s]

batch 620 loss: 0.42177354246377946


Train, Epoch 9 / 20:  41%|████      | 636/1563 [00:17<00:25, 36.14it/s]

batch 630 loss: 0.372319769859314


Train, Epoch 9 / 20:  41%|████      | 644/1563 [00:18<00:25, 35.35it/s]

batch 640 loss: 0.4349472939968109


Train, Epoch 9 / 20:  42%|████▏     | 656/1563 [00:18<00:25, 36.02it/s]

batch 650 loss: 0.4148555710911751


Train, Epoch 9 / 20:  42%|████▏     | 664/1563 [00:18<00:25, 35.56it/s]

batch 660 loss: 0.3582298904657364


Train, Epoch 9 / 20:  43%|████▎     | 676/1563 [00:19<00:24, 35.81it/s]

batch 670 loss: 0.3751981183886528


Train, Epoch 9 / 20:  44%|████▍     | 684/1563 [00:19<00:25, 35.10it/s]

batch 680 loss: 0.47059165239334105


Train, Epoch 9 / 20:  45%|████▍     | 696/1563 [00:19<00:24, 35.76it/s]

batch 690 loss: 0.42885390520095823


Train, Epoch 9 / 20:  45%|████▌     | 704/1563 [00:19<00:23, 36.05it/s]

batch 700 loss: 0.4739172548055649


Train, Epoch 9 / 20:  46%|████▌     | 716/1563 [00:20<00:23, 35.66it/s]

batch 710 loss: 0.48444646298885347


Train, Epoch 9 / 20:  46%|████▋     | 724/1563 [00:20<00:24, 33.71it/s]

batch 720 loss: 0.3887106865644455


Train, Epoch 9 / 20:  47%|████▋     | 736/1563 [00:20<00:26, 31.43it/s]

batch 730 loss: 0.39403067231178285


Train, Epoch 9 / 20:  48%|████▊     | 744/1563 [00:21<00:25, 31.56it/s]

batch 740 loss: 0.3601905956864357


Train, Epoch 9 / 20:  48%|████▊     | 756/1563 [00:21<00:25, 31.27it/s]

batch 750 loss: 0.385234571993351


Train, Epoch 9 / 20:  49%|████▉     | 764/1563 [00:21<00:25, 31.89it/s]

batch 760 loss: 0.40448596626520156


Train, Epoch 9 / 20:  50%|████▉     | 776/1563 [00:22<00:24, 32.05it/s]

batch 770 loss: 0.44112259745597837


Train, Epoch 9 / 20:  50%|█████     | 784/1563 [00:22<00:24, 31.68it/s]

batch 780 loss: 0.4044181019067764


Train, Epoch 9 / 20:  51%|█████     | 796/1563 [00:22<00:24, 31.65it/s]

batch 790 loss: 0.5780183732509613


Train, Epoch 9 / 20:  51%|█████▏    | 804/1563 [00:22<00:22, 33.58it/s]

batch 800 loss: 0.4223803386092186


Train, Epoch 9 / 20:  52%|█████▏    | 816/1563 [00:23<00:21, 35.34it/s]

batch 810 loss: 0.4247331902384758


Train, Epoch 9 / 20:  53%|█████▎    | 824/1563 [00:23<00:20, 35.88it/s]

batch 820 loss: 0.39251298606395724


Train, Epoch 9 / 20:  53%|█████▎    | 836/1563 [00:23<00:20, 36.32it/s]

batch 830 loss: 0.49897451400756837


Train, Epoch 9 / 20:  54%|█████▍    | 844/1563 [00:24<00:19, 36.65it/s]

batch 840 loss: 0.37410505264997485


Train, Epoch 9 / 20:  55%|█████▍    | 856/1563 [00:24<00:19, 36.51it/s]

batch 850 loss: 0.4050548017024994


Train, Epoch 9 / 20:  55%|█████▌    | 864/1563 [00:24<00:19, 36.17it/s]

batch 860 loss: 0.40642060041427613


Train, Epoch 9 / 20:  56%|█████▌    | 876/1563 [00:24<00:19, 35.84it/s]

batch 870 loss: 0.4072558805346489


Train, Epoch 9 / 20:  57%|█████▋    | 884/1563 [00:25<00:19, 35.50it/s]

batch 880 loss: 0.40592520534992216


Train, Epoch 9 / 20:  57%|█████▋    | 896/1563 [00:25<00:18, 36.16it/s]

batch 890 loss: 0.383754800260067


Train, Epoch 9 / 20:  58%|█████▊    | 904/1563 [00:25<00:18, 36.15it/s]

batch 900 loss: 0.4176436334848404


Train, Epoch 9 / 20:  59%|█████▊    | 916/1563 [00:26<00:17, 36.08it/s]

batch 910 loss: 0.39789691269397737


Train, Epoch 9 / 20:  59%|█████▉    | 924/1563 [00:26<00:17, 36.60it/s]

batch 920 loss: 0.48683042228221896


Train, Epoch 9 / 20:  60%|█████▉    | 936/1563 [00:26<00:17, 36.74it/s]

batch 930 loss: 0.4319822579622269


Train, Epoch 9 / 20:  60%|██████    | 944/1563 [00:26<00:16, 36.72it/s]

batch 940 loss: 0.3599892929196358


Train, Epoch 9 / 20:  61%|██████    | 956/1563 [00:27<00:16, 36.75it/s]

batch 950 loss: 0.3896423771977425


Train, Epoch 9 / 20:  62%|██████▏   | 964/1563 [00:27<00:16, 36.21it/s]

batch 960 loss: 0.45843543112277985


Train, Epoch 9 / 20:  62%|██████▏   | 976/1563 [00:27<00:16, 36.38it/s]

batch 970 loss: 0.45805783569812775


Train, Epoch 9 / 20:  63%|██████▎   | 984/1563 [00:27<00:15, 36.49it/s]

batch 980 loss: 0.4288326919078827


Train, Epoch 9 / 20:  64%|██████▎   | 996/1563 [00:28<00:15, 36.93it/s]

batch 990 loss: 0.48957171738147737


Train, Epoch 9 / 20:  64%|██████▍   | 1004/1563 [00:28<00:15, 36.11it/s]

batch 1000 loss: 0.3645009115338326


Train, Epoch 9 / 20:  65%|██████▌   | 1016/1563 [00:28<00:15, 36.26it/s]

batch 1010 loss: 0.37172935009002683


Train, Epoch 9 / 20:  66%|██████▌   | 1024/1563 [00:29<00:14, 36.24it/s]

batch 1020 loss: 0.3914379060268402


Train, Epoch 9 / 20:  66%|██████▋   | 1036/1563 [00:29<00:14, 36.61it/s]

batch 1030 loss: 0.4002134442329407


Train, Epoch 9 / 20:  67%|██████▋   | 1044/1563 [00:29<00:14, 35.71it/s]

batch 1040 loss: 0.4679463908076286


Train, Epoch 9 / 20:  68%|██████▊   | 1056/1563 [00:29<00:14, 35.37it/s]

batch 1050 loss: 0.509763988852501


Train, Epoch 9 / 20:  68%|██████▊   | 1064/1563 [00:30<00:14, 35.41it/s]

batch 1060 loss: 0.38481918126344683


Train, Epoch 9 / 20:  69%|██████▉   | 1076/1563 [00:30<00:13, 35.09it/s]

batch 1070 loss: 0.38786288201808927


Train, Epoch 9 / 20:  69%|██████▉   | 1084/1563 [00:30<00:13, 34.92it/s]

batch 1080 loss: 0.4159469619393349


Train, Epoch 9 / 20:  70%|███████   | 1096/1563 [00:31<00:13, 35.35it/s]

batch 1090 loss: 0.4069239869713783


Train, Epoch 9 / 20:  71%|███████   | 1104/1563 [00:31<00:12, 35.81it/s]

batch 1100 loss: 0.42059677839279175


Train, Epoch 9 / 20:  71%|███████▏  | 1116/1563 [00:31<00:12, 35.39it/s]

batch 1110 loss: 0.45739752650260923


Train, Epoch 9 / 20:  72%|███████▏  | 1124/1563 [00:31<00:12, 35.57it/s]

batch 1120 loss: 0.4245008617639542


Train, Epoch 9 / 20:  73%|███████▎  | 1136/1563 [00:32<00:11, 35.74it/s]

batch 1130 loss: 0.41925764083862305


Train, Epoch 9 / 20:  73%|███████▎  | 1144/1563 [00:32<00:11, 35.79it/s]

batch 1140 loss: 0.4038839489221573


Train, Epoch 9 / 20:  74%|███████▍  | 1156/1563 [00:32<00:11, 36.09it/s]

batch 1150 loss: 0.4080620020627975


Train, Epoch 9 / 20:  74%|███████▍  | 1164/1563 [00:32<00:11, 33.25it/s]

batch 1160 loss: 0.4060883656144142


Train, Epoch 9 / 20:  75%|███████▌  | 1176/1563 [00:33<00:11, 32.50it/s]

batch 1170 loss: 0.3729806676506996


Train, Epoch 9 / 20:  76%|███████▌  | 1184/1563 [00:33<00:12, 31.57it/s]

batch 1180 loss: 0.3808103069663048


Train, Epoch 9 / 20:  77%|███████▋  | 1196/1563 [00:33<00:11, 32.61it/s]

batch 1190 loss: 0.36055555641651155


Train, Epoch 9 / 20:  77%|███████▋  | 1204/1563 [00:34<00:10, 33.44it/s]

batch 1200 loss: 0.3378892496228218


Train, Epoch 9 / 20:  78%|███████▊  | 1216/1563 [00:34<00:10, 31.97it/s]

batch 1210 loss: 0.39585118740797043


Train, Epoch 9 / 20:  78%|███████▊  | 1224/1563 [00:34<00:10, 30.86it/s]

batch 1220 loss: 0.3855677366256714


Train, Epoch 9 / 20:  79%|███████▉  | 1236/1563 [00:35<00:10, 31.65it/s]

batch 1230 loss: 0.38129401206970215


Train, Epoch 9 / 20:  80%|███████▉  | 1244/1563 [00:35<00:09, 33.75it/s]

batch 1240 loss: 0.39408778548240664


Train, Epoch 9 / 20:  80%|████████  | 1256/1563 [00:35<00:08, 34.83it/s]

batch 1250 loss: 0.43204200863838194


Train, Epoch 9 / 20:  81%|████████  | 1264/1563 [00:36<00:08, 35.17it/s]

batch 1260 loss: 0.380862694978714


Train, Epoch 9 / 20:  82%|████████▏ | 1276/1563 [00:36<00:07, 36.01it/s]

batch 1270 loss: 0.3851640775799751


Train, Epoch 9 / 20:  82%|████████▏ | 1284/1563 [00:36<00:07, 35.44it/s]

batch 1280 loss: 0.3938797727227211


Train, Epoch 9 / 20:  83%|████████▎ | 1296/1563 [00:36<00:07, 35.77it/s]

batch 1290 loss: 0.39692685306072234


Train, Epoch 9 / 20:  83%|████████▎ | 1304/1563 [00:37<00:07, 36.22it/s]

batch 1300 loss: 0.37710538804531096


Train, Epoch 9 / 20:  84%|████████▍ | 1316/1563 [00:37<00:06, 36.31it/s]

batch 1310 loss: 0.36333311945199964


Train, Epoch 9 / 20:  85%|████████▍ | 1324/1563 [00:37<00:06, 36.11it/s]

batch 1320 loss: 0.4379919469356537


Train, Epoch 9 / 20:  85%|████████▌ | 1336/1563 [00:38<00:06, 35.99it/s]

batch 1330 loss: 0.44307722747325895


Train, Epoch 9 / 20:  86%|████████▌ | 1344/1563 [00:38<00:06, 35.81it/s]

batch 1340 loss: 0.3587483361363411


Train, Epoch 9 / 20:  87%|████████▋ | 1356/1563 [00:38<00:05, 36.18it/s]

batch 1350 loss: 0.4171107590198517


Train, Epoch 9 / 20:  87%|████████▋ | 1364/1563 [00:38<00:05, 35.76it/s]

batch 1360 loss: 0.3717815726995468


Train, Epoch 9 / 20:  88%|████████▊ | 1376/1563 [00:39<00:05, 36.42it/s]

batch 1370 loss: 0.371824124455452


Train, Epoch 9 / 20:  89%|████████▊ | 1384/1563 [00:39<00:04, 36.35it/s]

batch 1380 loss: 0.4560153841972351


Train, Epoch 9 / 20:  89%|████████▉ | 1396/1563 [00:39<00:04, 36.58it/s]

batch 1390 loss: 0.41629784405231474


Train, Epoch 9 / 20:  90%|████████▉ | 1404/1563 [00:39<00:04, 36.67it/s]

batch 1400 loss: 0.4251436561346054


Train, Epoch 9 / 20:  91%|█████████ | 1416/1563 [00:40<00:04, 36.61it/s]

batch 1410 loss: 0.42653131783008574


Train, Epoch 9 / 20:  91%|█████████ | 1424/1563 [00:40<00:03, 36.46it/s]

batch 1420 loss: 0.30741478502750397


Train, Epoch 9 / 20:  92%|█████████▏| 1436/1563 [00:40<00:03, 35.86it/s]

batch 1430 loss: 0.4827328249812126


Train, Epoch 9 / 20:  92%|█████████▏| 1444/1563 [00:41<00:03, 35.70it/s]

batch 1440 loss: 0.38340702205896376


Train, Epoch 9 / 20:  93%|█████████▎| 1456/1563 [00:41<00:02, 35.88it/s]

batch 1450 loss: 0.373620942234993


Train, Epoch 9 / 20:  94%|█████████▎| 1464/1563 [00:41<00:02, 35.83it/s]

batch 1460 loss: 0.4425336390733719


Train, Epoch 9 / 20:  94%|█████████▍| 1476/1563 [00:41<00:02, 35.69it/s]

batch 1470 loss: 0.42834134101867677


Train, Epoch 9 / 20:  95%|█████████▍| 1484/1563 [00:42<00:02, 35.11it/s]

batch 1480 loss: 0.3821978151798248


Train, Epoch 9 / 20:  96%|█████████▌| 1496/1563 [00:42<00:01, 35.73it/s]

batch 1490 loss: 0.4938738405704498


Train, Epoch 9 / 20:  96%|█████████▌| 1504/1563 [00:42<00:01, 36.13it/s]

batch 1500 loss: 0.3833660617470741


Train, Epoch 9 / 20:  97%|█████████▋| 1516/1563 [00:43<00:01, 36.12it/s]

batch 1510 loss: 0.4868494927883148


Train, Epoch 9 / 20:  98%|█████████▊| 1524/1563 [00:43<00:01, 35.68it/s]

batch 1520 loss: 0.32900144159793854


Train, Epoch 9 / 20:  98%|█████████▊| 1536/1563 [00:43<00:00, 36.48it/s]

batch 1530 loss: 0.39404753148555755


Train, Epoch 9 / 20:  99%|█████████▉| 1544/1563 [00:43<00:00, 36.10it/s]

batch 1540 loss: 0.4053283154964447


Train, Epoch 9 / 20: 100%|█████████▉| 1556/1563 [00:44<00:00, 36.37it/s]

batch 1550 loss: 0.3636444672942162


Train, Epoch 9 / 20: 100%|██████████| 1563/1563 [00:44<00:00, 35.24it/s]


batch 1560 loss: 0.4457184821367264


Test, Epoch 9 / 20: 100%|██████████| 1563/1563 [00:21<00:00, 72.53it/s]


Epoch 9, loss: 0.46850153499007224, accuracy: 0.78468


Train, Epoch 10 / 20:   1%|          | 16/1563 [00:00<00:43, 35.83it/s]

batch 10 loss: 0.3712239325046539


Train, Epoch 10 / 20:   2%|▏         | 24/1563 [00:00<00:42, 35.90it/s]

batch 20 loss: 0.3906202748417854


Train, Epoch 10 / 20:   2%|▏         | 36/1563 [00:01<00:43, 35.51it/s]

batch 30 loss: 0.38341712653636933


Train, Epoch 10 / 20:   3%|▎         | 44/1563 [00:01<00:42, 35.99it/s]

batch 40 loss: 0.4201774477958679


Train, Epoch 10 / 20:   4%|▎         | 56/1563 [00:01<00:41, 36.23it/s]

batch 50 loss: 0.379152312874794


Train, Epoch 10 / 20:   4%|▍         | 64/1563 [00:01<00:40, 36.58it/s]

batch 60 loss: 0.4070602387189865


Train, Epoch 10 / 20:   5%|▍         | 76/1563 [00:02<00:40, 36.67it/s]

batch 70 loss: 0.37561060339212415


Train, Epoch 10 / 20:   5%|▌         | 84/1563 [00:02<00:40, 36.73it/s]

batch 80 loss: 0.3628381997346878


Train, Epoch 10 / 20:   6%|▌         | 96/1563 [00:02<00:40, 36.55it/s]

batch 90 loss: 0.3935284450650215


Train, Epoch 10 / 20:   7%|▋         | 104/1563 [00:02<00:40, 36.03it/s]

batch 100 loss: 0.426029697060585


Train, Epoch 10 / 20:   7%|▋         | 116/1563 [00:03<00:39, 36.23it/s]

batch 110 loss: 0.4215440511703491


Train, Epoch 10 / 20:   8%|▊         | 124/1563 [00:03<00:39, 36.11it/s]

batch 120 loss: 0.37059392780065536


Train, Epoch 10 / 20:   9%|▊         | 136/1563 [00:03<00:39, 35.82it/s]

batch 130 loss: 0.3746685177087784


Train, Epoch 10 / 20:   9%|▉         | 144/1563 [00:03<00:39, 36.01it/s]

batch 140 loss: 0.3732274517416954


Train, Epoch 10 / 20:  10%|▉         | 156/1563 [00:04<00:40, 34.56it/s]

batch 150 loss: 0.3808638289570808


Train, Epoch 10 / 20:  10%|█         | 164/1563 [00:04<00:42, 33.20it/s]

batch 160 loss: 0.34036834388971327


Train, Epoch 10 / 20:  11%|█▏        | 176/1563 [00:04<00:43, 31.85it/s]

batch 170 loss: 0.34389434158802035


Train, Epoch 10 / 20:  12%|█▏        | 184/1563 [00:05<00:44, 31.32it/s]

batch 180 loss: 0.38366384506225587


Train, Epoch 10 / 20:  13%|█▎        | 196/1563 [00:05<00:43, 31.57it/s]

batch 190 loss: 0.4030304104089737


Train, Epoch 10 / 20:  13%|█▎        | 204/1563 [00:05<00:43, 31.33it/s]

batch 200 loss: 0.39651081562042234


Train, Epoch 10 / 20:  14%|█▍        | 216/1563 [00:06<00:42, 31.34it/s]

batch 210 loss: 0.4118422999978065


Train, Epoch 10 / 20:  14%|█▍        | 224/1563 [00:06<00:42, 31.49it/s]

batch 220 loss: 0.39052911996841433


Train, Epoch 10 / 20:  15%|█▌        | 236/1563 [00:06<00:38, 34.39it/s]

batch 230 loss: 0.5161515399813652


Train, Epoch 10 / 20:  16%|█▌        | 244/1563 [00:07<00:37, 34.81it/s]

batch 240 loss: 0.345566126704216


Train, Epoch 10 / 20:  16%|█▋        | 256/1563 [00:07<00:36, 35.80it/s]

batch 250 loss: 0.39071763306856155


Train, Epoch 10 / 20:  17%|█▋        | 264/1563 [00:07<00:35, 36.23it/s]

batch 260 loss: 0.4991658806800842


Train, Epoch 10 / 20:  18%|█▊        | 276/1563 [00:07<00:35, 36.28it/s]

batch 270 loss: 0.4224513664841652


Train, Epoch 10 / 20:  18%|█▊        | 284/1563 [00:08<00:35, 35.72it/s]

batch 280 loss: 0.4065767675638199


Train, Epoch 10 / 20:  19%|█▉        | 296/1563 [00:08<00:35, 36.15it/s]

batch 290 loss: 0.36895027160644533


Train, Epoch 10 / 20:  19%|█▉        | 304/1563 [00:08<00:34, 36.17it/s]

batch 300 loss: 0.31443004459142687


Train, Epoch 10 / 20:  20%|██        | 316/1563 [00:09<00:34, 36.23it/s]

batch 310 loss: 0.3870823889970779


Train, Epoch 10 / 20:  21%|██        | 324/1563 [00:09<00:34, 35.65it/s]

batch 320 loss: 0.4029525384306908


Train, Epoch 10 / 20:  21%|██▏       | 336/1563 [00:09<00:34, 35.80it/s]

batch 330 loss: 0.40912913382053373


Train, Epoch 10 / 20:  22%|██▏       | 344/1563 [00:09<00:34, 35.75it/s]

batch 340 loss: 0.41602101624011995


Train, Epoch 10 / 20:  23%|██▎       | 356/1563 [00:10<00:33, 35.66it/s]

batch 350 loss: 0.3553814336657524


Train, Epoch 10 / 20:  23%|██▎       | 364/1563 [00:10<00:33, 35.92it/s]

batch 360 loss: 0.39050192534923556


Train, Epoch 10 / 20:  24%|██▍       | 376/1563 [00:10<00:32, 36.00it/s]

batch 370 loss: 0.44772749990224836


Train, Epoch 10 / 20:  25%|██▍       | 384/1563 [00:10<00:32, 35.91it/s]

batch 380 loss: 0.38947286605834963


Train, Epoch 10 / 20:  25%|██▌       | 396/1563 [00:11<00:32, 35.98it/s]

batch 390 loss: 0.43592569828033445


Train, Epoch 10 / 20:  26%|██▌       | 404/1563 [00:11<00:31, 36.53it/s]

batch 400 loss: 0.4159933775663376


Train, Epoch 10 / 20:  27%|██▋       | 416/1563 [00:11<00:31, 35.89it/s]

batch 410 loss: 0.4010000184178352


Train, Epoch 10 / 20:  27%|██▋       | 424/1563 [00:12<00:32, 35.03it/s]

batch 420 loss: 0.4628472238779068


Train, Epoch 10 / 20:  28%|██▊       | 436/1563 [00:12<00:31, 35.76it/s]

batch 430 loss: 0.35671696215868


Train, Epoch 10 / 20:  28%|██▊       | 444/1563 [00:12<00:31, 35.88it/s]

batch 440 loss: 0.39121956676244735


Train, Epoch 10 / 20:  29%|██▉       | 456/1563 [00:12<00:30, 36.34it/s]

batch 450 loss: 0.3554960861802101


Train, Epoch 10 / 20:  30%|██▉       | 464/1563 [00:13<00:30, 35.77it/s]

batch 460 loss: 0.3448710203170776


Train, Epoch 10 / 20:  30%|███       | 476/1563 [00:13<00:29, 36.37it/s]

batch 470 loss: 0.4070824980735779


Train, Epoch 10 / 20:  31%|███       | 484/1563 [00:13<00:29, 36.19it/s]

batch 480 loss: 0.43625869452953336


Train, Epoch 10 / 20:  32%|███▏      | 496/1563 [00:14<00:29, 35.73it/s]

batch 490 loss: 0.41814757585525514


Train, Epoch 10 / 20:  32%|███▏      | 504/1563 [00:14<00:29, 36.01it/s]

batch 500 loss: 0.3874689906835556


Train, Epoch 10 / 20:  33%|███▎      | 516/1563 [00:14<00:29, 36.04it/s]

batch 510 loss: 0.3795902252197266


Train, Epoch 10 / 20:  34%|███▎      | 524/1563 [00:14<00:28, 36.31it/s]

batch 520 loss: 0.36988423839211465


Train, Epoch 10 / 20:  34%|███▍      | 536/1563 [00:15<00:28, 36.17it/s]

batch 530 loss: 0.44688239991664885


Train, Epoch 10 / 20:  35%|███▍      | 544/1563 [00:15<00:28, 36.05it/s]

batch 540 loss: 0.469342777132988


Train, Epoch 10 / 20:  36%|███▌      | 556/1563 [00:15<00:27, 36.48it/s]

batch 550 loss: 0.3920187473297119


Train, Epoch 10 / 20:  36%|███▌      | 564/1563 [00:15<00:27, 36.25it/s]

batch 560 loss: 0.4192181810736656


Train, Epoch 10 / 20:  37%|███▋      | 576/1563 [00:16<00:27, 35.92it/s]

batch 570 loss: 0.35942327976226807


Train, Epoch 10 / 20:  37%|███▋      | 584/1563 [00:16<00:28, 34.89it/s]

batch 580 loss: 0.4009006544947624


Train, Epoch 10 / 20:  38%|███▊      | 596/1563 [00:16<00:28, 33.77it/s]

batch 590 loss: 0.45304442942142487


Train, Epoch 10 / 20:  39%|███▊      | 604/1563 [00:17<00:29, 32.81it/s]

batch 600 loss: 0.390187905728817


Train, Epoch 10 / 20:  39%|███▉      | 616/1563 [00:17<00:28, 33.50it/s]

batch 610 loss: 0.3488753944635391


Train, Epoch 10 / 20:  40%|███▉      | 624/1563 [00:17<00:28, 32.50it/s]

batch 620 loss: 0.42901509404182436


Train, Epoch 10 / 20:  41%|████      | 636/1563 [00:18<00:28, 32.18it/s]

batch 630 loss: 0.33998437970876694


Train, Epoch 10 / 20:  41%|████      | 644/1563 [00:18<00:29, 30.84it/s]

batch 640 loss: 0.5137995392084121


Train, Epoch 10 / 20:  42%|████▏     | 656/1563 [00:18<00:28, 32.00it/s]

batch 650 loss: 0.42477046251296996


Train, Epoch 10 / 20:  42%|████▏     | 664/1563 [00:19<00:26, 33.42it/s]

batch 660 loss: 0.4083906650543213


Train, Epoch 10 / 20:  43%|████▎     | 676/1563 [00:19<00:25, 35.10it/s]

batch 670 loss: 0.35462018847465515


Train, Epoch 10 / 20:  44%|████▍     | 684/1563 [00:19<00:24, 35.30it/s]

batch 680 loss: 0.30039731189608576


Train, Epoch 10 / 20:  45%|████▍     | 696/1563 [00:19<00:24, 35.71it/s]

batch 690 loss: 0.4354532554745674


Train, Epoch 10 / 20:  45%|████▌     | 704/1563 [00:20<00:24, 35.48it/s]

batch 700 loss: 0.4122240334749222


Train, Epoch 10 / 20:  46%|████▌     | 716/1563 [00:20<00:24, 35.18it/s]

batch 710 loss: 0.42898297756910325


Train, Epoch 10 / 20:  46%|████▋     | 724/1563 [00:20<00:23, 35.81it/s]

batch 720 loss: 0.3691874697804451


Train, Epoch 10 / 20:  47%|████▋     | 736/1563 [00:21<00:22, 36.37it/s]

batch 730 loss: 0.3621753677725792


Train, Epoch 10 / 20:  48%|████▊     | 744/1563 [00:21<00:22, 36.01it/s]

batch 740 loss: 0.39410833418369295


Train, Epoch 10 / 20:  48%|████▊     | 756/1563 [00:21<00:22, 36.13it/s]

batch 750 loss: 0.40619935393333434


Train, Epoch 10 / 20:  49%|████▉     | 764/1563 [00:21<00:22, 36.04it/s]

batch 760 loss: 0.4260546863079071


Train, Epoch 10 / 20:  50%|████▉     | 776/1563 [00:22<00:22, 35.44it/s]

batch 770 loss: 0.35294592678546904


Train, Epoch 10 / 20:  50%|█████     | 784/1563 [00:22<00:22, 35.17it/s]

batch 780 loss: 0.4005030930042267


Train, Epoch 10 / 20:  51%|█████     | 796/1563 [00:22<00:21, 34.97it/s]

batch 790 loss: 0.3313205987215042


Train, Epoch 10 / 20:  51%|█████▏    | 804/1563 [00:22<00:21, 35.83it/s]

batch 800 loss: 0.41348582208156587


Train, Epoch 10 / 20:  52%|█████▏    | 816/1563 [00:23<00:20, 35.89it/s]

batch 810 loss: 0.46967474818229676


Train, Epoch 10 / 20:  53%|█████▎    | 824/1563 [00:23<00:20, 35.66it/s]

batch 820 loss: 0.4666252702474594


Train, Epoch 10 / 20:  53%|█████▎    | 836/1563 [00:23<00:20, 35.10it/s]

batch 830 loss: 0.5040080130100251


Train, Epoch 10 / 20:  54%|█████▍    | 844/1563 [00:24<00:20, 35.07it/s]

batch 840 loss: 0.4112577348947525


Train, Epoch 10 / 20:  55%|█████▍    | 856/1563 [00:24<00:19, 35.92it/s]

batch 850 loss: 0.41543456763029096


Train, Epoch 10 / 20:  55%|█████▌    | 864/1563 [00:24<00:19, 36.25it/s]

batch 860 loss: 0.37172694206237794


Train, Epoch 10 / 20:  56%|█████▌    | 876/1563 [00:24<00:18, 36.52it/s]

batch 870 loss: 0.4807485699653625


Train, Epoch 10 / 20:  57%|█████▋    | 884/1563 [00:25<00:18, 36.03it/s]

batch 880 loss: 0.3621016666293144


Train, Epoch 10 / 20:  57%|█████▋    | 896/1563 [00:25<00:18, 35.47it/s]

batch 890 loss: 0.3799736022949219


Train, Epoch 10 / 20:  58%|█████▊    | 904/1563 [00:25<00:18, 35.23it/s]

batch 900 loss: 0.5096281737089157


Train, Epoch 10 / 20:  59%|█████▊    | 916/1563 [00:26<00:18, 35.76it/s]

batch 910 loss: 0.4021612122654915


Train, Epoch 10 / 20:  59%|█████▉    | 924/1563 [00:26<00:17, 36.09it/s]

batch 920 loss: 0.4114756226539612


Train, Epoch 10 / 20:  60%|█████▉    | 936/1563 [00:26<00:17, 35.65it/s]

batch 930 loss: 0.46529113948345185


Train, Epoch 10 / 20:  60%|██████    | 944/1563 [00:26<00:17, 35.44it/s]

batch 940 loss: 0.4243535876274109


Train, Epoch 10 / 20:  61%|██████    | 956/1563 [00:27<00:16, 36.30it/s]

batch 950 loss: 0.4117789477109909


Train, Epoch 10 / 20:  62%|██████▏   | 964/1563 [00:27<00:16, 36.27it/s]

batch 960 loss: 0.35990531742572784


Train, Epoch 10 / 20:  62%|██████▏   | 976/1563 [00:27<00:16, 35.94it/s]

batch 970 loss: 0.34235613495111467


Train, Epoch 10 / 20:  63%|██████▎   | 984/1563 [00:27<00:16, 35.93it/s]

batch 980 loss: 0.3143478736281395


Train, Epoch 10 / 20:  64%|██████▎   | 996/1563 [00:28<00:15, 36.21it/s]

batch 990 loss: 0.405178801715374


Train, Epoch 10 / 20:  64%|██████▍   | 1004/1563 [00:28<00:15, 35.96it/s]

batch 1000 loss: 0.3510856330394745


Train, Epoch 10 / 20:  65%|██████▌   | 1016/1563 [00:28<00:15, 36.20it/s]

batch 1010 loss: 0.4437064975500107


Train, Epoch 10 / 20:  66%|██████▌   | 1024/1563 [00:29<00:15, 34.04it/s]

batch 1020 loss: 0.36532599329948423


Train, Epoch 10 / 20:  66%|██████▋   | 1036/1563 [00:29<00:16, 32.25it/s]

batch 1030 loss: 0.4021585613489151


Train, Epoch 10 / 20:  67%|██████▋   | 1044/1563 [00:29<00:16, 31.51it/s]

batch 1040 loss: 0.3786310225725174


Train, Epoch 10 / 20:  68%|██████▊   | 1056/1563 [00:30<00:16, 30.85it/s]

batch 1050 loss: 0.329340435564518


Train, Epoch 10 / 20:  68%|██████▊   | 1064/1563 [00:30<00:15, 31.25it/s]

batch 1060 loss: 0.387354177236557


Train, Epoch 10 / 20:  69%|██████▉   | 1076/1563 [00:30<00:15, 30.99it/s]

batch 1070 loss: 0.386600686609745


Train, Epoch 10 / 20:  69%|██████▉   | 1084/1563 [00:31<00:14, 32.04it/s]

batch 1080 loss: 0.513224920630455


Train, Epoch 10 / 20:  70%|███████   | 1096/1563 [00:31<00:14, 32.78it/s]

batch 1090 loss: 0.3885683834552765


Train, Epoch 10 / 20:  71%|███████   | 1104/1563 [00:31<00:13, 34.35it/s]

batch 1100 loss: 0.3659797102212906


Train, Epoch 10 / 20:  71%|███████▏  | 1116/1563 [00:31<00:12, 35.10it/s]

batch 1110 loss: 0.36039652228355407


Train, Epoch 10 / 20:  72%|███████▏  | 1124/1563 [00:32<00:12, 35.58it/s]

batch 1120 loss: 0.4854047238826752


Train, Epoch 10 / 20:  73%|███████▎  | 1136/1563 [00:32<00:11, 35.70it/s]

batch 1130 loss: 0.411430624127388


Train, Epoch 10 / 20:  73%|███████▎  | 1144/1563 [00:32<00:11, 35.09it/s]

batch 1140 loss: 0.3673537626862526


Train, Epoch 10 / 20:  74%|███████▍  | 1156/1563 [00:33<00:11, 35.41it/s]

batch 1150 loss: 0.38498110324144363


Train, Epoch 10 / 20:  74%|███████▍  | 1164/1563 [00:33<00:11, 35.73it/s]

batch 1160 loss: 0.33278135359287264


Train, Epoch 10 / 20:  75%|███████▌  | 1176/1563 [00:33<00:10, 36.13it/s]

batch 1170 loss: 0.43386157155036925


Train, Epoch 10 / 20:  76%|███████▌  | 1184/1563 [00:33<00:10, 36.00it/s]

batch 1180 loss: 0.3288179710507393


Train, Epoch 10 / 20:  77%|███████▋  | 1196/1563 [00:34<00:09, 36.83it/s]

batch 1190 loss: 0.38012656271457673


Train, Epoch 10 / 20:  77%|███████▋  | 1204/1563 [00:34<00:09, 36.75it/s]

batch 1200 loss: 0.3208457201719284


Train, Epoch 10 / 20:  78%|███████▊  | 1216/1563 [00:34<00:09, 36.25it/s]

batch 1210 loss: 0.48979065716266634


Train, Epoch 10 / 20:  78%|███████▊  | 1224/1563 [00:34<00:09, 36.06it/s]

batch 1220 loss: 0.41560856997966766


Train, Epoch 10 / 20:  79%|███████▉  | 1236/1563 [00:35<00:09, 36.24it/s]

batch 1230 loss: 0.4016388952732086


Train, Epoch 10 / 20:  80%|███████▉  | 1244/1563 [00:35<00:08, 36.06it/s]

batch 1240 loss: 0.38044243901968


Train, Epoch 10 / 20:  80%|████████  | 1256/1563 [00:35<00:08, 35.60it/s]

batch 1250 loss: 0.30544053614139555


Train, Epoch 10 / 20:  81%|████████  | 1264/1563 [00:36<00:08, 35.79it/s]

batch 1260 loss: 0.44090238213539124


Train, Epoch 10 / 20:  82%|████████▏ | 1276/1563 [00:36<00:08, 35.77it/s]

batch 1270 loss: 0.3851853862404823


Train, Epoch 10 / 20:  82%|████████▏ | 1284/1563 [00:36<00:07, 36.04it/s]

batch 1280 loss: 0.4045133411884308


Train, Epoch 10 / 20:  83%|████████▎ | 1296/1563 [00:36<00:07, 36.20it/s]

batch 1290 loss: 0.4209931880235672


Train, Epoch 10 / 20:  83%|████████▎ | 1304/1563 [00:37<00:07, 36.45it/s]

batch 1300 loss: 0.4002427101135254


Train, Epoch 10 / 20:  84%|████████▍ | 1316/1563 [00:37<00:06, 36.10it/s]

batch 1310 loss: 0.47965866327285767


Train, Epoch 10 / 20:  85%|████████▍ | 1324/1563 [00:37<00:06, 35.63it/s]

batch 1320 loss: 0.4143982917070389


Train, Epoch 10 / 20:  85%|████████▌ | 1336/1563 [00:38<00:06, 36.24it/s]

batch 1330 loss: 0.3754282906651497


Train, Epoch 10 / 20:  86%|████████▌ | 1344/1563 [00:38<00:06, 36.40it/s]

batch 1340 loss: 0.3712498307228088


Train, Epoch 10 / 20:  87%|████████▋ | 1356/1563 [00:38<00:05, 36.13it/s]

batch 1350 loss: 0.4263694047927856


Train, Epoch 10 / 20:  87%|████████▋ | 1364/1563 [00:38<00:05, 35.53it/s]

batch 1360 loss: 0.4655658781528473


Train, Epoch 10 / 20:  88%|████████▊ | 1376/1563 [00:39<00:05, 35.83it/s]

batch 1370 loss: 0.3976148873567581


Train, Epoch 10 / 20:  89%|████████▊ | 1384/1563 [00:39<00:04, 35.91it/s]

batch 1380 loss: 0.438338029384613


Train, Epoch 10 / 20:  89%|████████▉ | 1396/1563 [00:39<00:04, 35.43it/s]

batch 1390 loss: 0.38404748141765593


Train, Epoch 10 / 20:  90%|████████▉ | 1404/1563 [00:39<00:04, 35.63it/s]

batch 1400 loss: 0.4500767648220062


Train, Epoch 10 / 20:  91%|█████████ | 1416/1563 [00:40<00:04, 35.87it/s]

batch 1410 loss: 0.3890182673931122


Train, Epoch 10 / 20:  91%|█████████ | 1424/1563 [00:40<00:03, 35.88it/s]

batch 1420 loss: 0.39432714581489564


Train, Epoch 10 / 20:  92%|█████████▏| 1436/1563 [00:40<00:03, 36.10it/s]

batch 1430 loss: 0.39047421514987946


Train, Epoch 10 / 20:  92%|█████████▏| 1444/1563 [00:41<00:03, 35.94it/s]

batch 1440 loss: 0.38132239133119583


Train, Epoch 10 / 20:  93%|█████████▎| 1456/1563 [00:41<00:03, 35.21it/s]

batch 1450 loss: 0.37906416952610017


Train, Epoch 10 / 20:  94%|█████████▎| 1464/1563 [00:41<00:03, 32.80it/s]

batch 1460 loss: 0.331359800696373


Train, Epoch 10 / 20:  94%|█████████▍| 1476/1563 [00:42<00:02, 31.27it/s]

batch 1470 loss: 0.4566833660006523


Train, Epoch 10 / 20:  95%|█████████▍| 1484/1563 [00:42<00:02, 32.60it/s]

batch 1480 loss: 0.39055086970329284


Train, Epoch 10 / 20:  96%|█████████▌| 1496/1563 [00:42<00:02, 33.42it/s]

batch 1490 loss: 0.36648325324058534


Train, Epoch 10 / 20:  96%|█████████▌| 1504/1563 [00:42<00:01, 32.77it/s]

batch 1500 loss: 0.3826207160949707


Train, Epoch 10 / 20:  97%|█████████▋| 1516/1563 [00:43<00:01, 32.37it/s]

batch 1510 loss: 0.43692601323127744


Train, Epoch 10 / 20:  98%|█████████▊| 1524/1563 [00:43<00:01, 31.75it/s]

batch 1520 loss: 0.4014311611652374


Train, Epoch 10 / 20:  98%|█████████▊| 1536/1563 [00:43<00:00, 32.43it/s]

batch 1530 loss: 0.40232907682657243


Train, Epoch 10 / 20:  99%|█████████▉| 1544/1563 [00:44<00:00, 33.74it/s]

batch 1540 loss: 0.3535422205924988


Train, Epoch 10 / 20: 100%|█████████▉| 1556/1563 [00:44<00:00, 34.79it/s]

batch 1550 loss: 0.4613927111029625


Train, Epoch 10 / 20: 100%|██████████| 1563/1563 [00:44<00:00, 34.96it/s]


batch 1560 loss: 0.3264722257852554


Test, Epoch 10 / 20: 100%|██████████| 1563/1563 [00:20<00:00, 74.45it/s]


Epoch 10, loss: 0.4715733430159092, accuracy: 0.78532


Train, Epoch 11 / 20:   1%|          | 16/1563 [00:00<00:42, 36.04it/s]

batch 10 loss: 0.4578952472656965


Train, Epoch 11 / 20:   2%|▏         | 24/1563 [00:00<00:44, 34.84it/s]

batch 20 loss: 0.3683154284954071


Train, Epoch 11 / 20:   2%|▏         | 36/1563 [00:01<00:48, 31.61it/s]

batch 30 loss: 0.415055912733078


Train, Epoch 11 / 20:   3%|▎         | 44/1563 [00:01<00:47, 31.81it/s]

batch 40 loss: 0.33987381160259245


Train, Epoch 11 / 20:   4%|▎         | 56/1563 [00:01<00:47, 31.50it/s]

batch 50 loss: 0.4008301332592964


Train, Epoch 11 / 20:   4%|▍         | 64/1563 [00:01<00:45, 32.75it/s]

batch 60 loss: 0.43006969392299654


Train, Epoch 11 / 20:   5%|▍         | 76/1563 [00:02<00:45, 32.39it/s]

batch 70 loss: 0.4039163053035736


Train, Epoch 11 / 20:   5%|▌         | 84/1563 [00:02<00:45, 32.27it/s]

batch 80 loss: 0.3822209596633911


Train, Epoch 11 / 20:   6%|▌         | 96/1563 [00:02<00:46, 31.29it/s]

batch 90 loss: 0.4265455141663551


Train, Epoch 11 / 20:   7%|▋         | 104/1563 [00:03<00:46, 31.64it/s]

batch 100 loss: 0.34488757848739626


Train, Epoch 11 / 20:   7%|▋         | 116/1563 [00:03<00:42, 34.28it/s]

batch 110 loss: 0.32974945604801176


Train, Epoch 11 / 20:   8%|▊         | 124/1563 [00:03<00:40, 35.30it/s]

batch 120 loss: 0.2773295521736145


Train, Epoch 11 / 20:   9%|▊         | 136/1563 [00:04<00:40, 35.57it/s]

batch 130 loss: 0.3963215246796608


Train, Epoch 11 / 20:   9%|▉         | 144/1563 [00:04<00:40, 35.12it/s]

batch 140 loss: 0.34521848112344744


Train, Epoch 11 / 20:  10%|▉         | 156/1563 [00:04<00:40, 35.12it/s]

batch 150 loss: 0.3825329229235649


Train, Epoch 11 / 20:  10%|█         | 164/1563 [00:04<00:39, 35.51it/s]

batch 160 loss: 0.41324355006217955


Train, Epoch 11 / 20:  11%|█▏        | 176/1563 [00:05<00:39, 35.22it/s]

batch 170 loss: 0.4435655325651169


Train, Epoch 11 / 20:  12%|█▏        | 184/1563 [00:05<00:39, 35.23it/s]

batch 180 loss: 0.511651536822319


Train, Epoch 11 / 20:  13%|█▎        | 196/1563 [00:05<00:37, 36.21it/s]

batch 190 loss: 0.3564274534583092


Train, Epoch 11 / 20:  13%|█▎        | 204/1563 [00:06<00:38, 35.69it/s]

batch 200 loss: 0.39621425569057467


Train, Epoch 11 / 20:  14%|█▍        | 216/1563 [00:06<00:37, 36.29it/s]

batch 210 loss: 0.49060112684965135


Train, Epoch 11 / 20:  14%|█▍        | 224/1563 [00:06<00:37, 35.79it/s]

batch 220 loss: 0.3694186359643936


Train, Epoch 11 / 20:  15%|█▌        | 236/1563 [00:06<00:36, 36.57it/s]

batch 230 loss: 0.39093911945819854


Train, Epoch 11 / 20:  16%|█▌        | 244/1563 [00:07<00:36, 36.17it/s]

batch 240 loss: 0.3530800297856331


Train, Epoch 11 / 20:  16%|█▋        | 256/1563 [00:07<00:36, 35.98it/s]

batch 250 loss: 0.4245562434196472


Train, Epoch 11 / 20:  17%|█▋        | 264/1563 [00:07<00:36, 35.65it/s]

batch 260 loss: 0.43096487522125243


Train, Epoch 11 / 20:  18%|█▊        | 276/1563 [00:08<00:35, 36.33it/s]

batch 270 loss: 0.3795063778758049


Train, Epoch 11 / 20:  18%|█▊        | 284/1563 [00:08<00:35, 36.46it/s]

batch 280 loss: 0.42444460839033127


Train, Epoch 11 / 20:  19%|█▉        | 296/1563 [00:08<00:34, 36.66it/s]

batch 290 loss: 0.3871150016784668


Train, Epoch 11 / 20:  19%|█▉        | 304/1563 [00:08<00:34, 36.03it/s]

batch 300 loss: 0.3556518018245697


Train, Epoch 11 / 20:  20%|██        | 316/1563 [00:09<00:34, 36.07it/s]

batch 310 loss: 0.34513860046863554


Train, Epoch 11 / 20:  21%|██        | 324/1563 [00:09<00:34, 36.44it/s]

batch 320 loss: 0.41651047915220263


Train, Epoch 11 / 20:  21%|██▏       | 336/1563 [00:09<00:33, 36.21it/s]

batch 330 loss: 0.40658304244279864


Train, Epoch 11 / 20:  22%|██▏       | 344/1563 [00:09<00:33, 35.89it/s]

batch 340 loss: 0.4145544618368149


Train, Epoch 11 / 20:  23%|██▎       | 356/1563 [00:10<00:33, 36.15it/s]

batch 350 loss: 0.34084135591983794


Train, Epoch 11 / 20:  23%|██▎       | 364/1563 [00:10<00:33, 35.81it/s]

batch 360 loss: 0.33118706196546555


Train, Epoch 11 / 20:  24%|██▍       | 376/1563 [00:10<00:32, 36.06it/s]

batch 370 loss: 0.36230340600013733


Train, Epoch 11 / 20:  25%|██▍       | 384/1563 [00:10<00:32, 35.95it/s]

batch 380 loss: 0.34358448535203934


Train, Epoch 11 / 20:  25%|██▌       | 396/1563 [00:11<00:32, 36.25it/s]

batch 390 loss: 0.44358663260936737


Train, Epoch 11 / 20:  26%|██▌       | 404/1563 [00:11<00:32, 35.66it/s]

batch 400 loss: 0.40191483199596406


Train, Epoch 11 / 20:  27%|██▋       | 416/1563 [00:11<00:31, 35.95it/s]

batch 410 loss: 0.36458013355731966


Train, Epoch 11 / 20:  27%|██▋       | 424/1563 [00:12<00:31, 35.73it/s]

batch 420 loss: 0.38824596256017685


Train, Epoch 11 / 20:  28%|██▊       | 436/1563 [00:12<00:31, 35.77it/s]

batch 430 loss: 0.4550010859966278


Train, Epoch 11 / 20:  28%|██▊       | 444/1563 [00:12<00:31, 35.87it/s]

batch 440 loss: 0.44576531946659087


Train, Epoch 11 / 20:  29%|██▉       | 456/1563 [00:12<00:30, 36.47it/s]

batch 450 loss: 0.3561050325632095


Train, Epoch 11 / 20:  30%|██▉       | 464/1563 [00:13<00:30, 35.60it/s]

batch 460 loss: 0.34512379169464114


Train, Epoch 11 / 20:  30%|███       | 476/1563 [00:13<00:33, 32.52it/s]

batch 470 loss: 0.4491466641426086


Train, Epoch 11 / 20:  31%|███       | 484/1563 [00:13<00:33, 32.17it/s]

batch 480 loss: 0.3644159764051437


Train, Epoch 11 / 20:  32%|███▏      | 496/1563 [00:14<00:32, 33.04it/s]

batch 490 loss: 0.39993124455213547


Train, Epoch 11 / 20:  32%|███▏      | 504/1563 [00:14<00:32, 32.67it/s]

batch 500 loss: 0.37803483158349993


Train, Epoch 11 / 20:  33%|███▎      | 516/1563 [00:14<00:31, 32.93it/s]

batch 510 loss: 0.46266039162874223


Train, Epoch 11 / 20:  34%|███▎      | 524/1563 [00:15<00:32, 31.86it/s]

batch 520 loss: 0.35187124609947207


Train, Epoch 11 / 20:  34%|███▍      | 536/1563 [00:15<00:33, 30.86it/s]

batch 530 loss: 0.4009856581687927


Train, Epoch 11 / 20:  35%|███▍      | 544/1563 [00:15<00:31, 31.90it/s]

batch 540 loss: 0.3498373493552208


Train, Epoch 11 / 20:  36%|███▌      | 556/1563 [00:16<00:28, 34.85it/s]

batch 550 loss: 0.39131672531366346


Train, Epoch 11 / 20:  36%|███▌      | 564/1563 [00:16<00:28, 35.67it/s]

batch 560 loss: 0.4934947147965431


Train, Epoch 11 / 20:  37%|███▋      | 576/1563 [00:16<00:27, 35.50it/s]

batch 570 loss: 0.5498910039663315


Train, Epoch 11 / 20:  37%|███▋      | 584/1563 [00:16<00:27, 35.79it/s]

batch 580 loss: 0.32777325063943863


Train, Epoch 11 / 20:  38%|███▊      | 596/1563 [00:17<00:26, 36.12it/s]

batch 590 loss: 0.41522955149412155


Train, Epoch 11 / 20:  39%|███▊      | 604/1563 [00:17<00:26, 36.10it/s]

batch 600 loss: 0.37788939774036406


Train, Epoch 11 / 20:  39%|███▉      | 616/1563 [00:17<00:26, 35.90it/s]

batch 610 loss: 0.3567127361893654


Train, Epoch 11 / 20:  40%|███▉      | 624/1563 [00:17<00:26, 35.64it/s]

batch 620 loss: 0.4309971690177917


Train, Epoch 11 / 20:  41%|████      | 636/1563 [00:18<00:25, 35.78it/s]

batch 630 loss: 0.3886712834239006


Train, Epoch 11 / 20:  41%|████      | 644/1563 [00:18<00:25, 35.96it/s]

batch 640 loss: 0.3651234611868858


Train, Epoch 11 / 20:  42%|████▏     | 656/1563 [00:18<00:25, 35.81it/s]

batch 650 loss: 0.35730513334274294


Train, Epoch 11 / 20:  42%|████▏     | 664/1563 [00:19<00:25, 35.75it/s]

batch 660 loss: 0.3853509098291397


Train, Epoch 11 / 20:  43%|████▎     | 676/1563 [00:19<00:24, 36.19it/s]

batch 670 loss: 0.3298368126153946


Train, Epoch 11 / 20:  44%|████▍     | 684/1563 [00:19<00:24, 36.19it/s]

batch 680 loss: 0.37643676400184634


Train, Epoch 11 / 20:  45%|████▍     | 696/1563 [00:19<00:24, 35.53it/s]

batch 690 loss: 0.3755478248000145


Train, Epoch 11 / 20:  45%|████▌     | 704/1563 [00:20<00:23, 36.20it/s]

batch 700 loss: 0.37567019313573835


Train, Epoch 11 / 20:  46%|████▌     | 716/1563 [00:20<00:23, 36.63it/s]

batch 710 loss: 0.4289026394486427


Train, Epoch 11 / 20:  46%|████▋     | 724/1563 [00:20<00:23, 36.12it/s]

batch 720 loss: 0.4844371199607849


Train, Epoch 11 / 20:  47%|████▋     | 736/1563 [00:21<00:23, 35.80it/s]

batch 730 loss: 0.3275909602642059


Train, Epoch 11 / 20:  48%|████▊     | 744/1563 [00:21<00:22, 36.32it/s]

batch 740 loss: 0.39400345683097837


Train, Epoch 11 / 20:  48%|████▊     | 756/1563 [00:21<00:22, 36.46it/s]

batch 750 loss: 0.3180506587028503


Train, Epoch 11 / 20:  49%|████▉     | 764/1563 [00:21<00:22, 36.05it/s]

batch 760 loss: 0.45387449860572815


Train, Epoch 11 / 20:  50%|████▉     | 776/1563 [00:22<00:21, 36.28it/s]

batch 770 loss: 0.28263640999794004


Train, Epoch 11 / 20:  50%|█████     | 784/1563 [00:22<00:21, 36.24it/s]

batch 780 loss: 0.38047238141298295


Train, Epoch 11 / 20:  51%|█████     | 796/1563 [00:22<00:21, 36.52it/s]

batch 790 loss: 0.38393048346042635


Train, Epoch 11 / 20:  51%|█████▏    | 804/1563 [00:22<00:21, 35.63it/s]

batch 800 loss: 0.3480416864156723


Train, Epoch 11 / 20:  52%|█████▏    | 816/1563 [00:23<00:20, 35.98it/s]

batch 810 loss: 0.47937125265598296


Train, Epoch 11 / 20:  53%|█████▎    | 824/1563 [00:23<00:20, 35.64it/s]

batch 820 loss: 0.3170410193502903


Train, Epoch 11 / 20:  53%|█████▎    | 836/1563 [00:23<00:20, 35.61it/s]

batch 830 loss: 0.3661168560385704


Train, Epoch 11 / 20:  54%|█████▍    | 844/1563 [00:24<00:20, 35.34it/s]

batch 840 loss: 0.38249174505472183


Train, Epoch 11 / 20:  55%|█████▍    | 856/1563 [00:24<00:19, 36.31it/s]

batch 850 loss: 0.37555635422468187


Train, Epoch 11 / 20:  55%|█████▌    | 864/1563 [00:24<00:19, 36.43it/s]

batch 860 loss: 0.42267019003629686


Train, Epoch 11 / 20:  56%|█████▌    | 876/1563 [00:24<00:19, 35.97it/s]

batch 870 loss: 0.3605405181646347


Train, Epoch 11 / 20:  57%|█████▋    | 884/1563 [00:25<00:19, 35.61it/s]

batch 880 loss: 0.34136280715465545


Train, Epoch 11 / 20:  57%|█████▋    | 896/1563 [00:25<00:18, 36.17it/s]

batch 890 loss: 0.32310619577765465


Train, Epoch 11 / 20:  58%|█████▊    | 904/1563 [00:25<00:19, 34.54it/s]

batch 900 loss: 0.3406603381037712


Train, Epoch 11 / 20:  59%|█████▊    | 916/1563 [00:26<00:19, 32.51it/s]

batch 910 loss: 0.42058471739292147


Train, Epoch 11 / 20:  59%|█████▉    | 924/1563 [00:26<00:20, 31.87it/s]

batch 920 loss: 0.44144883900880816


Train, Epoch 11 / 20:  60%|█████▉    | 936/1563 [00:26<00:20, 30.88it/s]

batch 930 loss: 0.3819736324250698


Train, Epoch 11 / 20:  60%|██████    | 944/1563 [00:27<00:20, 30.90it/s]

batch 940 loss: 0.3614700883626938


Train, Epoch 11 / 20:  61%|██████    | 952/1563 [00:27<00:19, 31.41it/s]

batch 950 loss: 0.43434693813323977


Train, Epoch 11 / 20:  62%|██████▏   | 963/1563 [00:27<00:20, 29.48it/s]

batch 960 loss: 0.44297271966934204


Train, Epoch 11 / 20:  62%|██████▏   | 975/1563 [00:28<00:20, 29.16it/s]

batch 970 loss: 0.3648657500743866


Train, Epoch 11 / 20:  63%|██████▎   | 987/1563 [00:28<00:17, 33.11it/s]

batch 980 loss: 0.3896745443344116


Train, Epoch 11 / 20:  64%|██████▎   | 995/1563 [00:28<00:16, 34.45it/s]

batch 990 loss: 0.3623704582452774


Train, Epoch 11 / 20:  64%|██████▍   | 1007/1563 [00:29<00:15, 34.96it/s]

batch 1000 loss: 0.286951519548893


Train, Epoch 11 / 20:  65%|██████▍   | 1015/1563 [00:29<00:15, 35.01it/s]

batch 1010 loss: 0.4219524532556534


Train, Epoch 11 / 20:  66%|██████▌   | 1027/1563 [00:29<00:15, 35.62it/s]

batch 1020 loss: 0.44431510865688323


Train, Epoch 11 / 20:  66%|██████▌   | 1035/1563 [00:29<00:14, 35.74it/s]

batch 1030 loss: 0.3759191006422043


Train, Epoch 11 / 20:  67%|██████▋   | 1047/1563 [00:30<00:14, 36.05it/s]

batch 1040 loss: 0.30583139657974245


Train, Epoch 11 / 20:  67%|██████▋   | 1055/1563 [00:30<00:14, 35.71it/s]

batch 1050 loss: 0.3979897305369377


Train, Epoch 11 / 20:  68%|██████▊   | 1067/1563 [00:30<00:14, 35.28it/s]

batch 1060 loss: 0.3693293139338493


Train, Epoch 11 / 20:  69%|██████▉   | 1075/1563 [00:30<00:13, 35.61it/s]

batch 1070 loss: 0.3640949174761772


Train, Epoch 11 / 20:  70%|██████▉   | 1087/1563 [00:31<00:13, 36.21it/s]

batch 1080 loss: 0.3056627199053764


Train, Epoch 11 / 20:  70%|███████   | 1095/1563 [00:31<00:12, 36.35it/s]

batch 1090 loss: 0.42129229158163073


Train, Epoch 11 / 20:  71%|███████   | 1107/1563 [00:31<00:12, 36.48it/s]

batch 1100 loss: 0.413514107465744


Train, Epoch 11 / 20:  71%|███████▏  | 1115/1563 [00:32<00:12, 36.58it/s]

batch 1110 loss: 0.33784005641937254


Train, Epoch 11 / 20:  72%|███████▏  | 1127/1563 [00:32<00:11, 36.45it/s]

batch 1120 loss: 0.3703710988163948


Train, Epoch 11 / 20:  73%|███████▎  | 1135/1563 [00:32<00:11, 36.30it/s]

batch 1130 loss: 0.3948418706655502


Train, Epoch 11 / 20:  73%|███████▎  | 1147/1563 [00:32<00:11, 36.53it/s]

batch 1140 loss: 0.3945988789200783


Train, Epoch 11 / 20:  74%|███████▍  | 1155/1563 [00:33<00:11, 35.84it/s]

batch 1150 loss: 0.3161202259361744


Train, Epoch 11 / 20:  75%|███████▍  | 1167/1563 [00:33<00:11, 35.95it/s]

batch 1160 loss: 0.4868568107485771


Train, Epoch 11 / 20:  75%|███████▌  | 1175/1563 [00:33<00:10, 35.82it/s]

batch 1170 loss: 0.36716277301311495


Train, Epoch 11 / 20:  76%|███████▌  | 1187/1563 [00:34<00:10, 35.20it/s]

batch 1180 loss: 0.39069308042526246


Train, Epoch 11 / 20:  76%|███████▋  | 1195/1563 [00:34<00:10, 34.63it/s]

batch 1190 loss: 0.2975857719779015


Train, Epoch 11 / 20:  77%|███████▋  | 1207/1563 [00:34<00:10, 35.35it/s]

batch 1200 loss: 0.37435808330774306


Train, Epoch 11 / 20:  78%|███████▊  | 1215/1563 [00:34<00:09, 35.28it/s]

batch 1210 loss: 0.41559692174196244


Train, Epoch 11 / 20:  79%|███████▊  | 1227/1563 [00:35<00:09, 35.17it/s]

batch 1220 loss: 0.40250632613897325


Train, Epoch 11 / 20:  79%|███████▉  | 1235/1563 [00:35<00:09, 35.40it/s]

batch 1230 loss: 0.3559138387441635


Train, Epoch 11 / 20:  80%|███████▉  | 1247/1563 [00:35<00:08, 36.32it/s]

batch 1240 loss: 0.3258329495787621


Train, Epoch 11 / 20:  80%|████████  | 1255/1563 [00:35<00:08, 36.10it/s]

batch 1250 loss: 0.33999419063329694


Train, Epoch 11 / 20:  81%|████████  | 1267/1563 [00:36<00:08, 35.47it/s]

batch 1260 loss: 0.4414595812559128


Train, Epoch 11 / 20:  82%|████████▏ | 1275/1563 [00:36<00:08, 35.88it/s]

batch 1270 loss: 0.39561927318573


Train, Epoch 11 / 20:  82%|████████▏ | 1287/1563 [00:36<00:07, 36.34it/s]

batch 1280 loss: 0.3340324550867081


Train, Epoch 11 / 20:  83%|████████▎ | 1295/1563 [00:37<00:07, 36.23it/s]

batch 1290 loss: 0.3747144415974617


Train, Epoch 11 / 20:  84%|████████▎ | 1307/1563 [00:37<00:07, 36.11it/s]

batch 1300 loss: 0.4275437444448471


Train, Epoch 11 / 20:  84%|████████▍ | 1315/1563 [00:37<00:06, 35.86it/s]

batch 1310 loss: 0.4030206561088562


Train, Epoch 11 / 20:  85%|████████▍ | 1327/1563 [00:37<00:06, 36.18it/s]

batch 1320 loss: 0.38737678378820417


Train, Epoch 11 / 20:  85%|████████▌ | 1335/1563 [00:38<00:06, 35.20it/s]

batch 1330 loss: 0.3913624152541161


Train, Epoch 11 / 20:  86%|████████▌ | 1343/1563 [00:38<00:06, 33.69it/s]

batch 1340 loss: 0.4042541205883026


Train, Epoch 11 / 20:  87%|████████▋ | 1355/1563 [00:38<00:06, 31.55it/s]

batch 1350 loss: 0.3567430809140205


Train, Epoch 11 / 20:  87%|████████▋ | 1363/1563 [00:39<00:06, 32.05it/s]

batch 1360 loss: 0.35471762269735335


Train, Epoch 11 / 20:  88%|████████▊ | 1375/1563 [00:39<00:05, 32.12it/s]

batch 1370 loss: 0.2985114350914955


Train, Epoch 11 / 20:  88%|████████▊ | 1383/1563 [00:39<00:05, 32.42it/s]

batch 1380 loss: 0.3439808964729309


Train, Epoch 11 / 20:  89%|████████▉ | 1395/1563 [00:40<00:05, 31.61it/s]

batch 1390 loss: 0.405607770383358


Train, Epoch 11 / 20:  90%|████████▉ | 1403/1563 [00:40<00:05, 31.80it/s]

batch 1400 loss: 0.35159172862768173


Train, Epoch 11 / 20:  91%|█████████ | 1415/1563 [00:40<00:04, 32.41it/s]

batch 1410 loss: 0.3526912197470665


Train, Epoch 11 / 20:  91%|█████████▏| 1427/1563 [00:41<00:03, 34.77it/s]

batch 1420 loss: 0.41005985289812086


Train, Epoch 11 / 20:  92%|█████████▏| 1435/1563 [00:41<00:03, 35.59it/s]

batch 1430 loss: 0.39429316371679307


Train, Epoch 11 / 20:  93%|█████████▎| 1447/1563 [00:41<00:03, 35.94it/s]

batch 1440 loss: 0.30338709950447085


Train, Epoch 11 / 20:  93%|█████████▎| 1455/1563 [00:41<00:03, 35.89it/s]

batch 1450 loss: 0.38423720598220823


Train, Epoch 11 / 20:  94%|█████████▍| 1467/1563 [00:42<00:02, 36.16it/s]

batch 1460 loss: 0.3854599595069885


Train, Epoch 11 / 20:  94%|█████████▍| 1475/1563 [00:42<00:02, 35.75it/s]

batch 1470 loss: 0.35052171647548674


Train, Epoch 11 / 20:  95%|█████████▌| 1487/1563 [00:42<00:02, 35.92it/s]

batch 1480 loss: 0.40612917989492414


Train, Epoch 11 / 20:  96%|█████████▌| 1495/1563 [00:42<00:01, 35.43it/s]

batch 1490 loss: 0.3557990208268166


Train, Epoch 11 / 20:  96%|█████████▋| 1507/1563 [00:43<00:01, 36.34it/s]

batch 1500 loss: 0.34759763479232786


Train, Epoch 11 / 20:  97%|█████████▋| 1515/1563 [00:43<00:01, 35.81it/s]

batch 1510 loss: 0.34471098631620406


Train, Epoch 11 / 20:  98%|█████████▊| 1527/1563 [00:43<00:00, 36.15it/s]

batch 1520 loss: 0.3271307185292244


Train, Epoch 11 / 20:  98%|█████████▊| 1535/1563 [00:44<00:00, 35.89it/s]

batch 1530 loss: 0.4859467148780823


Train, Epoch 11 / 20:  99%|█████████▉| 1547/1563 [00:44<00:00, 35.69it/s]

batch 1540 loss: 0.4364019274711609


Train, Epoch 11 / 20:  99%|█████████▉| 1555/1563 [00:44<00:00, 35.48it/s]

batch 1550 loss: 0.3716154396533966


Train, Epoch 11 / 20: 100%|██████████| 1563/1563 [00:44<00:00, 34.84it/s]


batch 1560 loss: 0.39671264588832855


Test, Epoch 11 / 20: 100%|██████████| 1563/1563 [00:21<00:00, 72.01it/s]


Epoch 11, loss: 0.4549656367206574, accuracy: 0.79284


Train, Epoch 12 / 20:   1%|          | 16/1563 [00:00<00:42, 36.11it/s]

batch 10 loss: 0.368330816924572


Train, Epoch 12 / 20:   2%|▏         | 24/1563 [00:00<00:43, 35.58it/s]

batch 20 loss: 0.4851144403219223


Train, Epoch 12 / 20:   2%|▏         | 36/1563 [00:01<00:42, 36.28it/s]

batch 30 loss: 0.41193114817142484


Train, Epoch 12 / 20:   3%|▎         | 44/1563 [00:01<00:41, 36.19it/s]

batch 40 loss: 0.35316959023475647


Train, Epoch 12 / 20:   4%|▎         | 56/1563 [00:01<00:42, 35.49it/s]

batch 50 loss: 0.4399311065673828


Train, Epoch 12 / 20:   4%|▍         | 64/1563 [00:01<00:42, 35.38it/s]

batch 60 loss: 0.3587848633527756


Train, Epoch 12 / 20:   5%|▍         | 76/1563 [00:02<00:41, 36.20it/s]

batch 70 loss: 0.34924918562173846


Train, Epoch 12 / 20:   5%|▌         | 84/1563 [00:02<00:40, 36.16it/s]

batch 80 loss: 0.3336936041712761


Train, Epoch 12 / 20:   6%|▌         | 96/1563 [00:02<00:40, 36.00it/s]

batch 90 loss: 0.3139104187488556


Train, Epoch 12 / 20:   7%|▋         | 104/1563 [00:02<00:40, 36.23it/s]

batch 100 loss: 0.33091839849948884


Train, Epoch 12 / 20:   7%|▋         | 116/1563 [00:03<00:39, 36.23it/s]

batch 110 loss: 0.3790941059589386


Train, Epoch 12 / 20:   8%|▊         | 124/1563 [00:03<00:39, 36.13it/s]

batch 120 loss: 0.3378704398870468


Train, Epoch 12 / 20:   9%|▊         | 136/1563 [00:03<00:39, 35.83it/s]

batch 130 loss: 0.3511973977088928


Train, Epoch 12 / 20:   9%|▉         | 144/1563 [00:04<00:39, 36.24it/s]

batch 140 loss: 0.44414920210838316


Train, Epoch 12 / 20:  10%|▉         | 156/1563 [00:04<00:39, 35.83it/s]

batch 150 loss: 0.2725295960903168


Train, Epoch 12 / 20:  10%|█         | 164/1563 [00:04<00:39, 35.58it/s]

batch 160 loss: 0.28250348567962646


Train, Epoch 12 / 20:  11%|█▏        | 176/1563 [00:04<00:38, 36.08it/s]

batch 170 loss: 0.3533665776252747


Train, Epoch 12 / 20:  12%|█▏        | 184/1563 [00:05<00:37, 36.40it/s]

batch 180 loss: 0.4117017284035683


Train, Epoch 12 / 20:  13%|█▎        | 196/1563 [00:05<00:37, 36.54it/s]

batch 190 loss: 0.3365134447813034


Train, Epoch 12 / 20:  13%|█▎        | 204/1563 [00:05<00:37, 36.04it/s]

batch 200 loss: 0.35456458032131194


Train, Epoch 12 / 20:  14%|█▍        | 216/1563 [00:05<00:36, 36.44it/s]

batch 210 loss: 0.40021085143089297


Train, Epoch 12 / 20:  14%|█▍        | 224/1563 [00:06<00:36, 36.29it/s]

batch 220 loss: 0.4270026683807373


Train, Epoch 12 / 20:  15%|█▌        | 236/1563 [00:06<00:36, 36.34it/s]

batch 230 loss: 0.3609511286020279


Train, Epoch 12 / 20:  16%|█▌        | 244/1563 [00:06<00:37, 35.38it/s]

batch 240 loss: 0.3761619806289673


Train, Epoch 12 / 20:  16%|█▋        | 256/1563 [00:07<00:36, 35.86it/s]

batch 250 loss: 0.4376230835914612


Train, Epoch 12 / 20:  17%|█▋        | 264/1563 [00:07<00:36, 35.51it/s]

batch 260 loss: 0.4061338439583778


Train, Epoch 12 / 20:  18%|█▊        | 276/1563 [00:07<00:35, 35.87it/s]

batch 270 loss: 0.37753353714942933


Train, Epoch 12 / 20:  18%|█▊        | 284/1563 [00:07<00:36, 35.48it/s]

batch 280 loss: 0.2848661780357361


Train, Epoch 12 / 20:  19%|█▉        | 296/1563 [00:08<00:35, 35.61it/s]

batch 290 loss: 0.41742787659168246


Train, Epoch 12 / 20:  19%|█▉        | 304/1563 [00:08<00:34, 36.13it/s]

batch 300 loss: 0.3742931917309761


Train, Epoch 12 / 20:  20%|██        | 316/1563 [00:08<00:34, 36.07it/s]

batch 310 loss: 0.4720927581191063


Train, Epoch 12 / 20:  21%|██        | 324/1563 [00:09<00:34, 36.16it/s]

batch 320 loss: 0.3758332714438438


Train, Epoch 12 / 20:  21%|██▏       | 336/1563 [00:09<00:36, 33.47it/s]

batch 330 loss: 0.3002245470881462


Train, Epoch 12 / 20:  22%|██▏       | 344/1563 [00:09<00:35, 34.03it/s]

batch 340 loss: 0.45541775077581403


Train, Epoch 12 / 20:  23%|██▎       | 356/1563 [00:09<00:37, 32.43it/s]

batch 350 loss: 0.40143786519765856


Train, Epoch 12 / 20:  23%|██▎       | 364/1563 [00:10<00:37, 32.34it/s]

batch 360 loss: 0.3712619036436081


Train, Epoch 12 / 20:  24%|██▍       | 376/1563 [00:10<00:37, 31.43it/s]

batch 370 loss: 0.28082913160324097


Train, Epoch 12 / 20:  25%|██▍       | 384/1563 [00:10<00:37, 31.06it/s]

batch 380 loss: 0.4299291580915451


Train, Epoch 12 / 20:  25%|██▌       | 396/1563 [00:11<00:37, 30.85it/s]

batch 390 loss: 0.35402041524648664


Train, Epoch 12 / 20:  26%|██▌       | 404/1563 [00:11<00:38, 30.06it/s]

batch 400 loss: 0.40002041310071945


Train, Epoch 12 / 20:  27%|██▋       | 416/1563 [00:11<00:34, 33.53it/s]

batch 410 loss: 0.32805571258068084


Train, Epoch 12 / 20:  27%|██▋       | 424/1563 [00:12<00:33, 34.39it/s]

batch 420 loss: 0.33708013147115706


Train, Epoch 12 / 20:  28%|██▊       | 436/1563 [00:12<00:31, 35.26it/s]

batch 430 loss: 0.3875316396355629


Train, Epoch 12 / 20:  28%|██▊       | 444/1563 [00:12<00:31, 35.22it/s]

batch 440 loss: 0.2507953204214573


Train, Epoch 12 / 20:  29%|██▉       | 456/1563 [00:13<00:30, 35.82it/s]

batch 450 loss: 0.43032373785972594


Train, Epoch 12 / 20:  30%|██▉       | 464/1563 [00:13<00:31, 35.20it/s]

batch 460 loss: 0.31115936785936354


Train, Epoch 12 / 20:  30%|███       | 476/1563 [00:13<00:30, 35.77it/s]

batch 470 loss: 0.38056062161922455


Train, Epoch 12 / 20:  31%|███       | 484/1563 [00:13<00:29, 36.20it/s]

batch 480 loss: 0.3455465942621231


Train, Epoch 12 / 20:  32%|███▏      | 496/1563 [00:14<00:30, 35.51it/s]

batch 490 loss: 0.35671579986810686


Train, Epoch 12 / 20:  32%|███▏      | 504/1563 [00:14<00:29, 36.08it/s]

batch 500 loss: 0.3691661924123764


Train, Epoch 12 / 20:  33%|███▎      | 516/1563 [00:14<00:29, 35.84it/s]

batch 510 loss: 0.3947849065065384


Train, Epoch 12 / 20:  34%|███▎      | 524/1563 [00:14<00:28, 35.93it/s]

batch 520 loss: 0.35563346594572065


Train, Epoch 12 / 20:  34%|███▍      | 536/1563 [00:15<00:28, 36.03it/s]

batch 530 loss: 0.2819187127053738


Train, Epoch 12 / 20:  35%|███▍      | 544/1563 [00:15<00:28, 35.56it/s]

batch 540 loss: 0.3459756374359131


Train, Epoch 12 / 20:  36%|███▌      | 556/1563 [00:15<00:28, 35.86it/s]

batch 550 loss: 0.3755694590508938


Train, Epoch 12 / 20:  36%|███▌      | 564/1563 [00:16<00:28, 35.43it/s]

batch 560 loss: 0.3601840823888779


Train, Epoch 12 / 20:  37%|███▋      | 576/1563 [00:16<00:27, 35.54it/s]

batch 570 loss: 0.4226634383201599


Train, Epoch 12 / 20:  37%|███▋      | 584/1563 [00:16<00:27, 35.77it/s]

batch 580 loss: 0.36603434979915617


Train, Epoch 12 / 20:  38%|███▊      | 596/1563 [00:16<00:26, 36.00it/s]

batch 590 loss: 0.38890004754066465


Train, Epoch 12 / 20:  39%|███▊      | 604/1563 [00:17<00:26, 35.57it/s]

batch 600 loss: 0.39848815500736234


Train, Epoch 12 / 20:  39%|███▉      | 616/1563 [00:17<00:26, 36.40it/s]

batch 610 loss: 0.34252998530864714


Train, Epoch 12 / 20:  40%|███▉      | 624/1563 [00:17<00:25, 36.53it/s]

batch 620 loss: 0.3287026211619377


Train, Epoch 12 / 20:  41%|████      | 636/1563 [00:18<00:25, 36.16it/s]

batch 630 loss: 0.35549787878990174


Train, Epoch 12 / 20:  41%|████      | 644/1563 [00:18<00:25, 35.55it/s]

batch 640 loss: 0.2896686837077141


Train, Epoch 12 / 20:  42%|████▏     | 656/1563 [00:18<00:24, 36.56it/s]

batch 650 loss: 0.3325172021985054


Train, Epoch 12 / 20:  42%|████▏     | 664/1563 [00:18<00:24, 36.17it/s]

batch 660 loss: 0.3047702372074127


Train, Epoch 12 / 20:  43%|████▎     | 676/1563 [00:19<00:25, 35.31it/s]

batch 670 loss: 0.3421586208045483


Train, Epoch 12 / 20:  44%|████▍     | 684/1563 [00:19<00:24, 35.65it/s]

batch 680 loss: 0.35335162580013274


Train, Epoch 12 / 20:  45%|████▍     | 696/1563 [00:19<00:24, 35.79it/s]

batch 690 loss: 0.4736739546060562


Train, Epoch 12 / 20:  45%|████▌     | 704/1563 [00:19<00:24, 35.68it/s]

batch 700 loss: 0.3712760999798775


Train, Epoch 12 / 20:  46%|████▌     | 716/1563 [00:20<00:23, 35.56it/s]

batch 710 loss: 0.29671726524829867


Train, Epoch 12 / 20:  46%|████▋     | 724/1563 [00:20<00:23, 35.95it/s]

batch 720 loss: 0.2854983925819397


Train, Epoch 12 / 20:  47%|████▋     | 736/1563 [00:20<00:23, 35.62it/s]

batch 730 loss: 0.3269848808646202


Train, Epoch 12 / 20:  48%|████▊     | 744/1563 [00:21<00:22, 35.77it/s]

batch 740 loss: 0.3583401992917061


Train, Epoch 12 / 20:  48%|████▊     | 756/1563 [00:21<00:22, 35.63it/s]

batch 750 loss: 0.38716684505343435


Train, Epoch 12 / 20:  49%|████▉     | 764/1563 [00:21<00:23, 34.40it/s]

batch 760 loss: 0.362243740260601


Train, Epoch 12 / 20:  50%|████▉     | 776/1563 [00:22<00:23, 32.91it/s]

batch 770 loss: 0.3950567960739136


Train, Epoch 12 / 20:  50%|█████     | 784/1563 [00:22<00:24, 31.48it/s]

batch 780 loss: 0.4315871924161911


Train, Epoch 12 / 20:  51%|█████     | 796/1563 [00:22<00:24, 31.61it/s]

batch 790 loss: 0.27736495062708855


Train, Epoch 12 / 20:  51%|█████▏    | 804/1563 [00:22<00:24, 31.45it/s]

batch 800 loss: 0.40485174357891085


Train, Epoch 12 / 20:  52%|█████▏    | 816/1563 [00:23<00:24, 30.67it/s]

batch 810 loss: 0.44671532064676284


Train, Epoch 12 / 20:  53%|█████▎    | 824/1563 [00:23<00:23, 30.90it/s]

batch 820 loss: 0.3242449089884758


Train, Epoch 12 / 20:  53%|█████▎    | 835/1563 [00:23<00:24, 29.38it/s]

batch 830 loss: 0.4322133079171181


Train, Epoch 12 / 20:  54%|█████▍    | 847/1563 [00:24<00:22, 32.31it/s]

batch 840 loss: 0.40056618899106977


Train, Epoch 12 / 20:  55%|█████▍    | 855/1563 [00:24<00:20, 34.09it/s]

batch 850 loss: 0.4519622504711151


Train, Epoch 12 / 20:  55%|█████▌    | 867/1563 [00:24<00:19, 35.39it/s]

batch 860 loss: 0.31770913004875184


Train, Epoch 12 / 20:  56%|█████▌    | 875/1563 [00:25<00:19, 35.35it/s]

batch 870 loss: 0.3188213646411896


Train, Epoch 12 / 20:  57%|█████▋    | 887/1563 [00:25<00:18, 35.79it/s]

batch 880 loss: 0.3694703593850136


Train, Epoch 12 / 20:  57%|█████▋    | 895/1563 [00:25<00:18, 35.42it/s]

batch 890 loss: 0.32610526233911513


Train, Epoch 12 / 20:  58%|█████▊    | 907/1563 [00:26<00:18, 35.57it/s]

batch 900 loss: 0.35589743554592135


Train, Epoch 12 / 20:  59%|█████▊    | 915/1563 [00:26<00:18, 35.60it/s]

batch 910 loss: 0.38559111058712003


Train, Epoch 12 / 20:  59%|█████▉    | 927/1563 [00:26<00:17, 35.91it/s]

batch 920 loss: 0.35585748702287673


Train, Epoch 12 / 20:  60%|█████▉    | 935/1563 [00:26<00:17, 35.96it/s]

batch 930 loss: 0.4575529620051384


Train, Epoch 12 / 20:  61%|██████    | 947/1563 [00:27<00:17, 36.09it/s]

batch 940 loss: 0.2876890793442726


Train, Epoch 12 / 20:  61%|██████    | 955/1563 [00:27<00:17, 35.58it/s]

batch 950 loss: 0.3009865164756775


Train, Epoch 12 / 20:  62%|██████▏   | 967/1563 [00:27<00:17, 35.02it/s]

batch 960 loss: 0.4390340283513069


Train, Epoch 12 / 20:  62%|██████▏   | 975/1563 [00:27<00:16, 35.41it/s]

batch 970 loss: 0.4054592356085777


Train, Epoch 12 / 20:  63%|██████▎   | 987/1563 [00:28<00:16, 35.87it/s]

batch 980 loss: 0.34792992770671843


Train, Epoch 12 / 20:  64%|██████▎   | 995/1563 [00:28<00:16, 35.40it/s]

batch 990 loss: 0.41404649019241335


Train, Epoch 12 / 20:  64%|██████▍   | 1007/1563 [00:28<00:15, 35.78it/s]

batch 1000 loss: 0.34902787804603574


Train, Epoch 12 / 20:  65%|██████▍   | 1015/1563 [00:29<00:15, 36.02it/s]

batch 1010 loss: 0.3508224204182625


Train, Epoch 12 / 20:  66%|██████▌   | 1027/1563 [00:29<00:14, 35.89it/s]

batch 1020 loss: 0.3650048092007637


Train, Epoch 12 / 20:  66%|██████▌   | 1035/1563 [00:29<00:14, 35.49it/s]

batch 1030 loss: 0.39205713868141173


Train, Epoch 12 / 20:  67%|██████▋   | 1047/1563 [00:29<00:14, 35.71it/s]

batch 1040 loss: 0.42445592731237414


Train, Epoch 12 / 20:  67%|██████▋   | 1055/1563 [00:30<00:14, 35.62it/s]

batch 1050 loss: 0.3182023152709007


Train, Epoch 12 / 20:  68%|██████▊   | 1067/1563 [00:30<00:14, 35.40it/s]

batch 1060 loss: 0.3523333936929703


Train, Epoch 12 / 20:  69%|██████▉   | 1075/1563 [00:30<00:13, 35.05it/s]

batch 1070 loss: 0.443357914686203


Train, Epoch 12 / 20:  70%|██████▉   | 1087/1563 [00:31<00:13, 35.23it/s]

batch 1080 loss: 0.3004930213093758


Train, Epoch 12 / 20:  70%|███████   | 1095/1563 [00:31<00:13, 34.99it/s]

batch 1090 loss: 0.392294642329216


Train, Epoch 12 / 20:  71%|███████   | 1107/1563 [00:31<00:12, 35.92it/s]

batch 1100 loss: 0.3421920254826546


Train, Epoch 12 / 20:  71%|███████▏  | 1115/1563 [00:31<00:12, 36.38it/s]

batch 1110 loss: 0.3493833601474762


Train, Epoch 12 / 20:  72%|███████▏  | 1127/1563 [00:32<00:12, 35.73it/s]

batch 1120 loss: 0.4154796153306961


Train, Epoch 12 / 20:  73%|███████▎  | 1135/1563 [00:32<00:12, 35.29it/s]

batch 1130 loss: 0.3695786938071251


Train, Epoch 12 / 20:  73%|███████▎  | 1147/1563 [00:32<00:11, 35.75it/s]

batch 1140 loss: 0.46143576353788374


Train, Epoch 12 / 20:  74%|███████▍  | 1155/1563 [00:32<00:11, 36.00it/s]

batch 1150 loss: 0.27645375952124596


Train, Epoch 12 / 20:  75%|███████▍  | 1167/1563 [00:33<00:10, 36.42it/s]

batch 1160 loss: 0.38976619243621824


Train, Epoch 12 / 20:  75%|███████▌  | 1175/1563 [00:33<00:10, 35.37it/s]

batch 1170 loss: 0.35191634148359296


Train, Epoch 12 / 20:  76%|███████▌  | 1187/1563 [00:33<00:10, 36.03it/s]

batch 1180 loss: 0.31352042108774186


Train, Epoch 12 / 20:  76%|███████▋  | 1195/1563 [00:34<00:10, 33.85it/s]

batch 1190 loss: 0.36739195734262464


Train, Epoch 12 / 20:  77%|███████▋  | 1203/1563 [00:34<00:11, 32.63it/s]

batch 1200 loss: 0.3663678616285324


Train, Epoch 12 / 20:  78%|███████▊  | 1215/1563 [00:34<00:10, 32.29it/s]

batch 1210 loss: 0.3945263996720314


Train, Epoch 12 / 20:  78%|███████▊  | 1223/1563 [00:35<00:10, 31.69it/s]

batch 1220 loss: 0.4208851963281631


Train, Epoch 12 / 20:  79%|███████▉  | 1235/1563 [00:35<00:10, 32.38it/s]

batch 1230 loss: 0.3829298496246338


Train, Epoch 12 / 20:  80%|███████▉  | 1243/1563 [00:35<00:09, 32.68it/s]

batch 1240 loss: 0.3817994549870491


Train, Epoch 12 / 20:  80%|████████  | 1255/1563 [00:36<00:09, 31.74it/s]

batch 1250 loss: 0.32447545081377027


Train, Epoch 12 / 20:  81%|████████  | 1263/1563 [00:36<00:09, 30.98it/s]

batch 1260 loss: 0.3502087026834488


Train, Epoch 12 / 20:  82%|████████▏ | 1275/1563 [00:36<00:08, 32.32it/s]

batch 1270 loss: 0.3677471697330475


Train, Epoch 12 / 20:  82%|████████▏ | 1287/1563 [00:36<00:08, 34.33it/s]

batch 1280 loss: 0.3952482283115387


Train, Epoch 12 / 20:  83%|████████▎ | 1295/1563 [00:37<00:07, 34.97it/s]

batch 1290 loss: 0.38131695836782453


Train, Epoch 12 / 20:  84%|████████▎ | 1307/1563 [00:37<00:07, 35.49it/s]

batch 1300 loss: 0.3889144122600555


Train, Epoch 12 / 20:  84%|████████▍ | 1315/1563 [00:37<00:07, 35.34it/s]

batch 1310 loss: 0.47247290015220644


Train, Epoch 12 / 20:  85%|████████▍ | 1327/1563 [00:38<00:06, 36.16it/s]

batch 1320 loss: 0.3390332370996475


Train, Epoch 12 / 20:  85%|████████▌ | 1335/1563 [00:38<00:06, 35.67it/s]

batch 1330 loss: 0.3993662536144257


Train, Epoch 12 / 20:  86%|████████▌ | 1347/1563 [00:38<00:05, 36.28it/s]

batch 1340 loss: 0.4212983474135399


Train, Epoch 12 / 20:  87%|████████▋ | 1355/1563 [00:38<00:05, 36.33it/s]

batch 1350 loss: 0.37169191241264343


Train, Epoch 12 / 20:  87%|████████▋ | 1367/1563 [00:39<00:05, 36.16it/s]

batch 1360 loss: 0.3497394070029259


Train, Epoch 12 / 20:  88%|████████▊ | 1375/1563 [00:39<00:05, 35.79it/s]

batch 1370 loss: 0.4968081384897232


Train, Epoch 12 / 20:  89%|████████▊ | 1387/1563 [00:39<00:04, 36.12it/s]

batch 1380 loss: 0.35520009547472


Train, Epoch 12 / 20:  89%|████████▉ | 1395/1563 [00:40<00:04, 36.10it/s]

batch 1390 loss: 0.3532364681363106


Train, Epoch 12 / 20:  90%|█████████ | 1407/1563 [00:40<00:04, 36.07it/s]

batch 1400 loss: 0.3707468718290329


Train, Epoch 12 / 20:  91%|█████████ | 1415/1563 [00:40<00:04, 35.85it/s]

batch 1410 loss: 0.36434842348098756


Train, Epoch 12 / 20:  91%|█████████▏| 1427/1563 [00:40<00:03, 35.48it/s]

batch 1420 loss: 0.2958375483751297


Train, Epoch 12 / 20:  92%|█████████▏| 1435/1563 [00:41<00:03, 35.48it/s]

batch 1430 loss: 0.35490753799676894


Train, Epoch 12 / 20:  93%|█████████▎| 1447/1563 [00:41<00:03, 35.01it/s]

batch 1440 loss: 0.40779470801353457


Train, Epoch 12 / 20:  93%|█████████▎| 1455/1563 [00:41<00:03, 34.87it/s]

batch 1450 loss: 0.3291048467159271


Train, Epoch 12 / 20:  94%|█████████▍| 1467/1563 [00:42<00:02, 35.38it/s]

batch 1460 loss: 0.3370402678847313


Train, Epoch 12 / 20:  94%|█████████▍| 1475/1563 [00:42<00:02, 35.37it/s]

batch 1470 loss: 0.3981273129582405


Train, Epoch 12 / 20:  95%|█████████▌| 1487/1563 [00:42<00:02, 35.56it/s]

batch 1480 loss: 0.34627881497144697


Train, Epoch 12 / 20:  96%|█████████▌| 1495/1563 [00:42<00:01, 35.47it/s]

batch 1490 loss: 0.28901018798351286


Train, Epoch 12 / 20:  96%|█████████▋| 1507/1563 [00:43<00:01, 36.02it/s]

batch 1500 loss: 0.26213834509253503


Train, Epoch 12 / 20:  97%|█████████▋| 1515/1563 [00:43<00:01, 36.06it/s]

batch 1510 loss: 0.339213689416647


Train, Epoch 12 / 20:  98%|█████████▊| 1527/1563 [00:43<00:00, 36.19it/s]

batch 1520 loss: 0.4438116043806076


Train, Epoch 12 / 20:  98%|█████████▊| 1535/1563 [00:43<00:00, 35.80it/s]

batch 1530 loss: 0.47478911876678465


Train, Epoch 12 / 20:  99%|█████████▉| 1547/1563 [00:44<00:00, 36.06it/s]

batch 1540 loss: 0.40420149862766264


Train, Epoch 12 / 20:  99%|█████████▉| 1555/1563 [00:44<00:00, 35.53it/s]

batch 1550 loss: 0.3192225396633148


Train, Epoch 12 / 20: 100%|██████████| 1563/1563 [00:44<00:00, 34.94it/s]


batch 1560 loss: 0.3297904670238495


Test, Epoch 12 / 20: 100%|██████████| 1563/1563 [00:21<00:00, 71.62it/s]


Epoch 12, loss: 0.468350437002182, accuracy: 0.79268


Train, Epoch 13 / 20:   1%|          | 16/1563 [00:00<00:43, 35.66it/s]

batch 10 loss: 0.3593149557709694


Train, Epoch 13 / 20:   2%|▏         | 24/1563 [00:00<00:42, 36.22it/s]

batch 20 loss: 0.36840142011642457


Train, Epoch 13 / 20:   2%|▏         | 36/1563 [00:01<00:42, 35.71it/s]

batch 30 loss: 0.3441817730665207


Train, Epoch 13 / 20:   3%|▎         | 44/1563 [00:01<00:42, 35.51it/s]

batch 40 loss: 0.4003629289567471


Train, Epoch 13 / 20:   4%|▎         | 56/1563 [00:01<00:42, 35.69it/s]

batch 50 loss: 0.41973221600055693


Train, Epoch 13 / 20:   4%|▍         | 64/1563 [00:01<00:42, 35.59it/s]

batch 60 loss: 0.30627396106719973


Train, Epoch 13 / 20:   5%|▍         | 76/1563 [00:02<00:41, 35.87it/s]

batch 70 loss: 0.34672854989767077


Train, Epoch 13 / 20:   5%|▌         | 84/1563 [00:02<00:41, 35.51it/s]

batch 80 loss: 0.3673381581902504


Train, Epoch 13 / 20:   6%|▌         | 96/1563 [00:02<00:40, 35.78it/s]

batch 90 loss: 0.3116190262138844


Train, Epoch 13 / 20:   7%|▋         | 104/1563 [00:02<00:40, 36.00it/s]

batch 100 loss: 0.3519384518265724


Train, Epoch 13 / 20:   7%|▋         | 116/1563 [00:03<00:41, 35.25it/s]

batch 110 loss: 0.3503407061100006


Train, Epoch 13 / 20:   8%|▊         | 124/1563 [00:03<00:40, 35.35it/s]

batch 120 loss: 0.3308356419205666


Train, Epoch 13 / 20:   9%|▊         | 136/1563 [00:03<00:39, 35.76it/s]

batch 130 loss: 0.3616623982787132


Train, Epoch 13 / 20:   9%|▉         | 144/1563 [00:04<00:39, 36.13it/s]

batch 140 loss: 0.37589411437511444


Train, Epoch 13 / 20:  10%|▉         | 156/1563 [00:04<00:39, 35.93it/s]

batch 150 loss: 0.3514482453465462


Train, Epoch 13 / 20:  10%|█         | 164/1563 [00:04<00:38, 35.98it/s]

batch 160 loss: 0.3693530082702637


Train, Epoch 13 / 20:  11%|█▏        | 176/1563 [00:04<00:39, 34.87it/s]

batch 170 loss: 0.3397528752684593


Train, Epoch 13 / 20:  12%|█▏        | 184/1563 [00:05<00:41, 32.99it/s]

batch 180 loss: 0.32273182421922686


Train, Epoch 13 / 20:  13%|█▎        | 196/1563 [00:05<00:41, 33.31it/s]

batch 190 loss: 0.35267467200756075


Train, Epoch 13 / 20:  13%|█▎        | 204/1563 [00:05<00:40, 33.54it/s]

batch 200 loss: 0.3174229606986046


Train, Epoch 13 / 20:  14%|█▍        | 216/1563 [00:06<00:40, 33.15it/s]

batch 210 loss: 0.33330254554748534


Train, Epoch 13 / 20:  14%|█▍        | 224/1563 [00:06<00:41, 32.37it/s]

batch 220 loss: 0.330436085164547


Train, Epoch 13 / 20:  15%|█▌        | 236/1563 [00:06<00:42, 31.57it/s]

batch 230 loss: 0.3745050191879272


Train, Epoch 13 / 20:  16%|█▌        | 244/1563 [00:07<00:42, 31.40it/s]

batch 240 loss: 0.23463256061077117


Train, Epoch 13 / 20:  16%|█▌        | 252/1563 [00:07<00:43, 30.35it/s]

batch 250 loss: 0.3201641172170639


Train, Epoch 13 / 20:  17%|█▋        | 267/1563 [00:07<00:39, 32.95it/s]

batch 260 loss: 0.3358155101537704


Train, Epoch 13 / 20:  18%|█▊        | 275/1563 [00:08<00:36, 34.86it/s]

batch 270 loss: 0.36349318027496336


Train, Epoch 13 / 20:  18%|█▊        | 287/1563 [00:08<00:35, 35.58it/s]

batch 280 loss: 0.36680093705654143


Train, Epoch 13 / 20:  19%|█▉        | 295/1563 [00:08<00:35, 35.42it/s]

batch 290 loss: 0.44731478542089465


Train, Epoch 13 / 20:  20%|█▉        | 307/1563 [00:08<00:34, 36.16it/s]

batch 300 loss: 0.2766761541366577


Train, Epoch 13 / 20:  20%|██        | 315/1563 [00:09<00:34, 36.26it/s]

batch 310 loss: 0.367469085752964


Train, Epoch 13 / 20:  21%|██        | 327/1563 [00:09<00:35, 35.22it/s]

batch 320 loss: 0.29871291369199754


Train, Epoch 13 / 20:  21%|██▏       | 335/1563 [00:09<00:34, 35.42it/s]

batch 330 loss: 0.39652534425258634


Train, Epoch 13 / 20:  22%|██▏       | 347/1563 [00:10<00:34, 35.59it/s]

batch 340 loss: 0.371703764796257


Train, Epoch 13 / 20:  23%|██▎       | 355/1563 [00:10<00:33, 35.56it/s]

batch 350 loss: 0.44514441937208177


Train, Epoch 13 / 20:  23%|██▎       | 367/1563 [00:10<00:33, 35.67it/s]

batch 360 loss: 0.32266148626804353


Train, Epoch 13 / 20:  24%|██▍       | 375/1563 [00:10<00:33, 35.74it/s]

batch 370 loss: 0.3014571115374565


Train, Epoch 13 / 20:  25%|██▍       | 387/1563 [00:11<00:33, 35.30it/s]

batch 380 loss: 0.2632780492305756


Train, Epoch 13 / 20:  25%|██▌       | 395/1563 [00:11<00:33, 34.71it/s]

batch 390 loss: 0.40971807688474654


Train, Epoch 13 / 20:  26%|██▌       | 407/1563 [00:11<00:32, 35.30it/s]

batch 400 loss: 0.37743211835622786


Train, Epoch 13 / 20:  27%|██▋       | 415/1563 [00:11<00:31, 35.94it/s]

batch 410 loss: 0.30292063057422636


Train, Epoch 13 / 20:  27%|██▋       | 427/1563 [00:12<00:31, 35.69it/s]

batch 420 loss: 0.344054713845253


Train, Epoch 13 / 20:  28%|██▊       | 435/1563 [00:12<00:32, 35.10it/s]

batch 430 loss: 0.36276140436530113


Train, Epoch 13 / 20:  29%|██▊       | 447/1563 [00:12<00:31, 35.23it/s]

batch 440 loss: 0.3046403184533119


Train, Epoch 13 / 20:  29%|██▉       | 455/1563 [00:13<00:31, 35.34it/s]

batch 450 loss: 0.371855816245079


Train, Epoch 13 / 20:  30%|██▉       | 467/1563 [00:13<00:30, 35.87it/s]

batch 460 loss: 0.36838150694966315


Train, Epoch 13 / 20:  30%|███       | 475/1563 [00:13<00:30, 35.47it/s]

batch 470 loss: 0.37738287076354027


Train, Epoch 13 / 20:  31%|███       | 487/1563 [00:13<00:30, 35.80it/s]

batch 480 loss: 0.26979720890522


Train, Epoch 13 / 20:  32%|███▏      | 495/1563 [00:14<00:29, 35.96it/s]

batch 490 loss: 0.38884996622800827


Train, Epoch 13 / 20:  32%|███▏      | 503/1563 [00:14<00:29, 35.54it/s]

batch 500 loss: 0.3128175899386406


Train, Epoch 13 / 20:  33%|███▎      | 515/1563 [00:14<00:29, 34.98it/s]

batch 510 loss: 0.38124578893184663


Train, Epoch 13 / 20:  34%|███▎      | 527/1563 [00:15<00:29, 35.69it/s]

batch 520 loss: 0.36927522271871566


Train, Epoch 13 / 20:  34%|███▍      | 535/1563 [00:15<00:28, 35.65it/s]

batch 530 loss: 0.3785565882921219


Train, Epoch 13 / 20:  35%|███▍      | 547/1563 [00:15<00:28, 35.36it/s]

batch 540 loss: 0.2996019676327705


Train, Epoch 13 / 20:  36%|███▌      | 555/1563 [00:15<00:28, 34.84it/s]

batch 550 loss: 0.3689677432179451


Train, Epoch 13 / 20:  36%|███▋      | 567/1563 [00:16<00:28, 34.60it/s]

batch 560 loss: 0.3391321927309036


Train, Epoch 13 / 20:  37%|███▋      | 575/1563 [00:16<00:27, 35.44it/s]

batch 570 loss: 0.322065132856369


Train, Epoch 13 / 20:  38%|███▊      | 587/1563 [00:16<00:27, 35.72it/s]

batch 580 loss: 0.3092823699116707


Train, Epoch 13 / 20:  38%|███▊      | 595/1563 [00:17<00:27, 35.73it/s]

batch 590 loss: 0.3834451511502266


Train, Epoch 13 / 20:  39%|███▉      | 607/1563 [00:17<00:26, 36.08it/s]

batch 600 loss: 0.311421175301075


Train, Epoch 13 / 20:  39%|███▉      | 615/1563 [00:17<00:28, 33.58it/s]

batch 610 loss: 0.428113716840744


Train, Epoch 13 / 20:  40%|███▉      | 623/1563 [00:17<00:28, 32.48it/s]

batch 620 loss: 0.3995988205075264


Train, Epoch 13 / 20:  41%|████      | 635/1563 [00:18<00:28, 32.77it/s]

batch 630 loss: 0.3711946666240692


Train, Epoch 13 / 20:  41%|████      | 643/1563 [00:18<00:27, 33.20it/s]

batch 640 loss: 0.40260831713676454


Train, Epoch 13 / 20:  42%|████▏     | 655/1563 [00:18<00:27, 33.55it/s]

batch 650 loss: 0.3967803418636322


Train, Epoch 13 / 20:  42%|████▏     | 663/1563 [00:19<00:26, 33.96it/s]

batch 660 loss: 0.3337475873529911


Train, Epoch 13 / 20:  43%|████▎     | 675/1563 [00:19<00:26, 33.62it/s]

batch 670 loss: 0.34850018918514253


Train, Epoch 13 / 20:  44%|████▎     | 683/1563 [00:19<00:27, 32.05it/s]

batch 680 loss: 0.32608126774430274


Train, Epoch 13 / 20:  44%|████▍     | 695/1563 [00:20<00:26, 32.43it/s]

batch 690 loss: 0.25696384012699125


Train, Epoch 13 / 20:  45%|████▌     | 707/1563 [00:20<00:25, 33.36it/s]

batch 700 loss: 0.36239746809005735


Train, Epoch 13 / 20:  46%|████▌     | 715/1563 [00:20<00:24, 34.27it/s]

batch 710 loss: 0.3604253724217415


Train, Epoch 13 / 20:  47%|████▋     | 727/1563 [00:21<00:23, 35.53it/s]

batch 720 loss: 0.2897879436612129


Train, Epoch 13 / 20:  47%|████▋     | 735/1563 [00:21<00:23, 35.34it/s]

batch 730 loss: 0.25305346846580506


Train, Epoch 13 / 20:  48%|████▊     | 747/1563 [00:21<00:23, 35.32it/s]

batch 740 loss: 0.35343929976224897


Train, Epoch 13 / 20:  48%|████▊     | 755/1563 [00:21<00:23, 34.92it/s]

batch 750 loss: 0.36198700964450836


Train, Epoch 13 / 20:  49%|████▉     | 767/1563 [00:22<00:22, 35.59it/s]

batch 760 loss: 0.3831414759159088


Train, Epoch 13 / 20:  50%|████▉     | 775/1563 [00:22<00:21, 35.94it/s]

batch 770 loss: 0.34210824184119704


Train, Epoch 13 / 20:  50%|█████     | 787/1563 [00:22<00:21, 36.06it/s]

batch 780 loss: 0.3638757258653641


Train, Epoch 13 / 20:  51%|█████     | 795/1563 [00:22<00:21, 35.66it/s]

batch 790 loss: 0.3053913973271847


Train, Epoch 13 / 20:  52%|█████▏    | 807/1563 [00:23<00:21, 35.68it/s]

batch 800 loss: 0.4107202082872391


Train, Epoch 13 / 20:  52%|█████▏    | 815/1563 [00:23<00:21, 35.56it/s]

batch 810 loss: 0.30943494439125063


Train, Epoch 13 / 20:  53%|█████▎    | 827/1563 [00:23<00:20, 35.13it/s]

batch 820 loss: 0.44407810270786285


Train, Epoch 13 / 20:  53%|█████▎    | 835/1563 [00:24<00:20, 35.14it/s]

batch 830 loss: 0.42058680951595306


Train, Epoch 13 / 20:  54%|█████▍    | 847/1563 [00:24<00:19, 35.86it/s]

batch 840 loss: 0.32162185907363894


Train, Epoch 13 / 20:  55%|█████▍    | 855/1563 [00:24<00:19, 35.61it/s]

batch 850 loss: 0.3251505345106125


Train, Epoch 13 / 20:  55%|█████▌    | 867/1563 [00:24<00:19, 35.43it/s]

batch 860 loss: 0.33191371858119967


Train, Epoch 13 / 20:  56%|█████▌    | 875/1563 [00:25<00:19, 35.51it/s]

batch 870 loss: 0.3491035044193268


Train, Epoch 13 / 20:  57%|█████▋    | 887/1563 [00:25<00:18, 35.78it/s]

batch 880 loss: 0.3156034514307976


Train, Epoch 13 / 20:  57%|█████▋    | 895/1563 [00:25<00:18, 35.54it/s]

batch 890 loss: 0.32664736807346345


Train, Epoch 13 / 20:  58%|█████▊    | 907/1563 [00:26<00:18, 35.54it/s]

batch 900 loss: 0.308198569715023


Train, Epoch 13 / 20:  59%|█████▊    | 915/1563 [00:26<00:17, 36.03it/s]

batch 910 loss: 0.34658933579921725


Train, Epoch 13 / 20:  59%|█████▉    | 927/1563 [00:26<00:17, 35.75it/s]

batch 920 loss: 0.40834594145417213


Train, Epoch 13 / 20:  60%|█████▉    | 935/1563 [00:26<00:17, 35.77it/s]

batch 930 loss: 0.41816070675849915


Train, Epoch 13 / 20:  61%|██████    | 947/1563 [00:27<00:17, 35.48it/s]

batch 940 loss: 0.3291401579976082


Train, Epoch 13 / 20:  61%|██████    | 955/1563 [00:27<00:17, 35.47it/s]

batch 950 loss: 0.36499678939580915


Train, Epoch 13 / 20:  62%|██████▏   | 967/1563 [00:27<00:16, 35.81it/s]

batch 960 loss: 0.29675341546535494


Train, Epoch 13 / 20:  62%|██████▏   | 975/1563 [00:28<00:16, 35.43it/s]

batch 970 loss: 0.2989193186163902


Train, Epoch 13 / 20:  63%|██████▎   | 987/1563 [00:28<00:15, 36.06it/s]

batch 980 loss: 0.3886556804180145


Train, Epoch 13 / 20:  64%|██████▎   | 995/1563 [00:28<00:15, 36.07it/s]

batch 990 loss: 0.31172514855861666


Train, Epoch 13 / 20:  64%|██████▍   | 1007/1563 [00:28<00:15, 35.90it/s]

batch 1000 loss: 0.43038449585437777


Train, Epoch 13 / 20:  65%|██████▍   | 1015/1563 [00:29<00:15, 35.58it/s]

batch 1010 loss: 0.3689505189657211


Train, Epoch 13 / 20:  66%|██████▌   | 1027/1563 [00:29<00:14, 36.02it/s]

batch 1020 loss: 0.36132044196128843


Train, Epoch 13 / 20:  66%|██████▌   | 1035/1563 [00:29<00:15, 34.99it/s]

batch 1030 loss: 0.3936385825276375


Train, Epoch 13 / 20:  67%|██████▋   | 1047/1563 [00:30<00:14, 35.70it/s]

batch 1040 loss: 0.34452456533908843


Train, Epoch 13 / 20:  67%|██████▋   | 1055/1563 [00:30<00:14, 35.48it/s]

batch 1050 loss: 0.3212063670158386


Train, Epoch 13 / 20:  68%|██████▊   | 1063/1563 [00:30<00:15, 33.22it/s]

batch 1060 loss: 0.33780352100729943


Train, Epoch 13 / 20:  69%|██████▉   | 1075/1563 [00:30<00:15, 31.68it/s]

batch 1070 loss: 0.340173177421093


Train, Epoch 13 / 20:  69%|██████▉   | 1083/1563 [00:31<00:15, 31.28it/s]

batch 1080 loss: 0.33717800974845885


Train, Epoch 13 / 20:  70%|███████   | 1095/1563 [00:31<00:14, 31.81it/s]

batch 1090 loss: 0.41163143515586853


Train, Epoch 13 / 20:  71%|███████   | 1103/1563 [00:31<00:13, 32.96it/s]

batch 1100 loss: 0.3646605521440506


Train, Epoch 13 / 20:  71%|███████▏  | 1115/1563 [00:32<00:13, 32.60it/s]

batch 1110 loss: 0.3765969157218933


Train, Epoch 13 / 20:  72%|███████▏  | 1123/1563 [00:32<00:13, 33.31it/s]

batch 1120 loss: 0.3858094483613968


Train, Epoch 13 / 20:  73%|███████▎  | 1135/1563 [00:32<00:13, 31.65it/s]

batch 1130 loss: 0.3276358351111412


Train, Epoch 13 / 20:  73%|███████▎  | 1143/1563 [00:33<00:13, 31.04it/s]

batch 1140 loss: 0.3871169060468674


Train, Epoch 13 / 20:  74%|███████▍  | 1155/1563 [00:33<00:12, 33.31it/s]

batch 1150 loss: 0.31634355187416074


Train, Epoch 13 / 20:  75%|███████▍  | 1167/1563 [00:33<00:11, 34.95it/s]

batch 1160 loss: 0.40908468663692477


Train, Epoch 13 / 20:  75%|███████▌  | 1175/1563 [00:33<00:10, 35.69it/s]

batch 1170 loss: 0.39879812151193617


Train, Epoch 13 / 20:  76%|███████▌  | 1187/1563 [00:34<00:10, 35.78it/s]

batch 1180 loss: 0.3509214848279953


Train, Epoch 13 / 20:  76%|███████▋  | 1195/1563 [00:34<00:10, 35.75it/s]

batch 1190 loss: 0.36552542746067046


Train, Epoch 13 / 20:  77%|███████▋  | 1207/1563 [00:34<00:09, 35.81it/s]

batch 1200 loss: 0.40692717730998995


Train, Epoch 13 / 20:  78%|███████▊  | 1215/1563 [00:35<00:10, 34.74it/s]

batch 1210 loss: 0.38865603506565094


Train, Epoch 13 / 20:  79%|███████▊  | 1227/1563 [00:35<00:09, 35.09it/s]

batch 1220 loss: 0.38174968957901


Train, Epoch 13 / 20:  79%|███████▉  | 1235/1563 [00:35<00:09, 35.15it/s]

batch 1230 loss: 0.3496645286679268


Train, Epoch 13 / 20:  80%|███████▉  | 1247/1563 [00:35<00:09, 35.06it/s]

batch 1240 loss: 0.36652870029211043


Train, Epoch 13 / 20:  80%|████████  | 1255/1563 [00:36<00:08, 34.71it/s]

batch 1250 loss: 0.2975814610719681


Train, Epoch 13 / 20:  81%|████████  | 1267/1563 [00:36<00:08, 35.21it/s]

batch 1260 loss: 0.43778269588947294


Train, Epoch 13 / 20:  82%|████████▏ | 1275/1563 [00:36<00:08, 34.71it/s]

batch 1270 loss: 0.37725360691547394


Train, Epoch 13 / 20:  82%|████████▏ | 1287/1563 [00:37<00:07, 35.51it/s]

batch 1280 loss: 0.2977922186255455


Train, Epoch 13 / 20:  83%|████████▎ | 1295/1563 [00:37<00:07, 34.84it/s]

batch 1290 loss: 0.3720640271902084


Train, Epoch 13 / 20:  84%|████████▎ | 1307/1563 [00:37<00:07, 35.08it/s]

batch 1300 loss: 0.43470010459423064


Train, Epoch 13 / 20:  84%|████████▍ | 1315/1563 [00:37<00:07, 35.21it/s]

batch 1310 loss: 0.3682734534144402


Train, Epoch 13 / 20:  85%|████████▍ | 1327/1563 [00:38<00:06, 35.78it/s]

batch 1320 loss: 0.3514553986489773


Train, Epoch 13 / 20:  85%|████████▌ | 1335/1563 [00:38<00:06, 35.41it/s]

batch 1330 loss: 0.3099899724125862


Train, Epoch 13 / 20:  86%|████████▌ | 1347/1563 [00:38<00:06, 35.88it/s]

batch 1340 loss: 0.34812676459550856


Train, Epoch 13 / 20:  87%|████████▋ | 1355/1563 [00:39<00:05, 35.65it/s]

batch 1350 loss: 0.34737287610769274


Train, Epoch 13 / 20:  87%|████████▋ | 1367/1563 [00:39<00:05, 35.62it/s]

batch 1360 loss: 0.33866149336099627


Train, Epoch 13 / 20:  88%|████████▊ | 1375/1563 [00:39<00:05, 35.24it/s]

batch 1370 loss: 0.3842355296015739


Train, Epoch 13 / 20:  89%|████████▊ | 1387/1563 [00:39<00:04, 36.01it/s]

batch 1380 loss: 0.3136120349168777


Train, Epoch 13 / 20:  89%|████████▉ | 1395/1563 [00:40<00:04, 35.96it/s]

batch 1390 loss: 0.38881799280643464


Train, Epoch 13 / 20:  90%|█████████ | 1407/1563 [00:40<00:04, 35.35it/s]

batch 1400 loss: 0.39370649829506876


Train, Epoch 13 / 20:  91%|█████████ | 1415/1563 [00:40<00:04, 35.21it/s]

batch 1410 loss: 0.3131963059306145


Train, Epoch 13 / 20:  91%|█████████▏| 1427/1563 [00:41<00:03, 35.99it/s]

batch 1420 loss: 0.3651390805840492


Train, Epoch 13 / 20:  92%|█████████▏| 1435/1563 [00:41<00:03, 35.73it/s]

batch 1430 loss: 0.4215614140033722


Train, Epoch 13 / 20:  93%|█████████▎| 1447/1563 [00:41<00:03, 35.92it/s]

batch 1440 loss: 0.3711686462163925


Train, Epoch 13 / 20:  93%|█████████▎| 1455/1563 [00:41<00:03, 35.45it/s]

batch 1450 loss: 0.28436925560235976


Train, Epoch 13 / 20:  94%|█████████▍| 1467/1563 [00:42<00:02, 35.84it/s]

batch 1460 loss: 0.39686339646577834


Train, Epoch 13 / 20:  94%|█████████▍| 1475/1563 [00:42<00:02, 35.41it/s]

batch 1470 loss: 0.2769531235098839


Train, Epoch 13 / 20:  95%|█████████▌| 1487/1563 [00:42<00:02, 35.94it/s]

batch 1480 loss: 0.3670981377363205


Train, Epoch 13 / 20:  96%|█████████▌| 1495/1563 [00:42<00:01, 36.16it/s]

batch 1490 loss: 0.41550211012363436


Train, Epoch 13 / 20:  96%|█████████▌| 1503/1563 [00:43<00:01, 34.06it/s]

batch 1500 loss: 0.33400464802980423


Train, Epoch 13 / 20:  97%|█████████▋| 1515/1563 [00:43<00:01, 31.79it/s]

batch 1510 loss: 0.360181188583374


Train, Epoch 13 / 20:  97%|█████████▋| 1523/1563 [00:43<00:01, 31.21it/s]

batch 1520 loss: 0.2983944624662399


Train, Epoch 13 / 20:  98%|█████████▊| 1535/1563 [00:44<00:00, 31.63it/s]

batch 1530 loss: 0.3463110998272896


Train, Epoch 13 / 20:  99%|█████████▊| 1543/1563 [00:44<00:00, 31.67it/s]

batch 1540 loss: 0.361184224486351


Train, Epoch 13 / 20:  99%|█████████▉| 1555/1563 [00:44<00:00, 30.64it/s]

batch 1550 loss: 0.32333030700683596


Train, Epoch 13 / 20: 100%|██████████| 1563/1563 [00:45<00:00, 34.60it/s]


batch 1560 loss: 0.3308419153094292


Test, Epoch 13 / 20: 100%|██████████| 1563/1563 [00:21<00:00, 72.91it/s]


Epoch 13, loss: 0.4710717241358757, accuracy: 0.79684


Train, Epoch 14 / 20:   1%|          | 16/1563 [00:00<00:43, 35.92it/s]

batch 10 loss: 0.3113094449043274


Train, Epoch 14 / 20:   2%|▏         | 24/1563 [00:00<00:43, 35.31it/s]

batch 20 loss: 0.3204644531011581


Train, Epoch 14 / 20:   2%|▏         | 36/1563 [00:01<00:43, 35.38it/s]

batch 30 loss: 0.3123077303171158


Train, Epoch 14 / 20:   3%|▎         | 44/1563 [00:01<00:43, 35.04it/s]

batch 40 loss: 0.3392203837633133


Train, Epoch 14 / 20:   4%|▎         | 56/1563 [00:01<00:42, 35.77it/s]

batch 50 loss: 0.3437723532319069


Train, Epoch 14 / 20:   4%|▍         | 64/1563 [00:01<00:45, 33.01it/s]

batch 60 loss: 0.3890348970890045


Train, Epoch 14 / 20:   5%|▍         | 76/1563 [00:02<00:46, 31.82it/s]

batch 70 loss: 0.2975211426615715


Train, Epoch 14 / 20:   5%|▌         | 84/1563 [00:02<00:47, 30.92it/s]

batch 80 loss: 0.3669411584734917


Train, Epoch 14 / 20:   6%|▌         | 96/1563 [00:02<00:45, 32.10it/s]

batch 90 loss: 0.41213883459568024


Train, Epoch 14 / 20:   7%|▋         | 104/1563 [00:03<00:44, 33.03it/s]

batch 100 loss: 0.37650189101696013


Train, Epoch 14 / 20:   7%|▋         | 116/1563 [00:03<00:43, 32.90it/s]

batch 110 loss: 0.36030846536159516


Train, Epoch 14 / 20:   8%|▊         | 124/1563 [00:03<00:44, 32.09it/s]

batch 120 loss: 0.3131928712129593


Train, Epoch 14 / 20:   9%|▊         | 136/1563 [00:04<00:45, 31.64it/s]

batch 130 loss: 0.4083887353539467


Train, Epoch 14 / 20:   9%|▉         | 144/1563 [00:04<00:44, 31.73it/s]

batch 140 loss: 0.3913951814174652


Train, Epoch 14 / 20:  10%|▉         | 156/1563 [00:04<00:40, 34.68it/s]

batch 150 loss: 0.3655686929821968


Train, Epoch 14 / 20:  10%|█         | 164/1563 [00:04<00:40, 34.88it/s]

batch 160 loss: 0.25322553515434265


Train, Epoch 14 / 20:  11%|█▏        | 176/1563 [00:05<00:39, 35.12it/s]

batch 170 loss: 0.29860032349824905


Train, Epoch 14 / 20:  12%|█▏        | 184/1563 [00:05<00:39, 35.00it/s]

batch 180 loss: 0.2991363421082497


Train, Epoch 14 / 20:  13%|█▎        | 196/1563 [00:05<00:38, 35.59it/s]

batch 190 loss: 0.31463335901498796


Train, Epoch 14 / 20:  13%|█▎        | 204/1563 [00:06<00:38, 35.06it/s]

batch 200 loss: 0.3443184643983841


Train, Epoch 14 / 20:  14%|█▍        | 216/1563 [00:06<00:38, 35.44it/s]

batch 210 loss: 0.3327160507440567


Train, Epoch 14 / 20:  14%|█▍        | 224/1563 [00:06<00:37, 35.38it/s]

batch 220 loss: 0.3604187905788422


Train, Epoch 14 / 20:  15%|█▌        | 236/1563 [00:06<00:37, 35.03it/s]

batch 230 loss: 0.38632612079381945


Train, Epoch 14 / 20:  16%|█▌        | 244/1563 [00:07<00:37, 35.02it/s]

batch 240 loss: 0.31200001686811446


Train, Epoch 14 / 20:  16%|█▋        | 256/1563 [00:07<00:36, 35.69it/s]

batch 250 loss: 0.407105852663517


Train, Epoch 14 / 20:  17%|█▋        | 264/1563 [00:07<00:36, 35.69it/s]

batch 260 loss: 0.3414460062980652


Train, Epoch 14 / 20:  18%|█▊        | 276/1563 [00:08<00:36, 35.57it/s]

batch 270 loss: 0.38516679108142854


Train, Epoch 14 / 20:  18%|█▊        | 284/1563 [00:08<00:36, 35.34it/s]

batch 280 loss: 0.41082680225372314


Train, Epoch 14 / 20:  19%|█▉        | 296/1563 [00:08<00:35, 35.77it/s]

batch 290 loss: 0.389367949962616


Train, Epoch 14 / 20:  19%|█▉        | 304/1563 [00:08<00:35, 35.86it/s]

batch 300 loss: 0.3990132614970207


Train, Epoch 14 / 20:  20%|██        | 316/1563 [00:09<00:35, 35.59it/s]

batch 310 loss: 0.40803112387657164


Train, Epoch 14 / 20:  21%|██        | 324/1563 [00:09<00:35, 35.21it/s]

batch 320 loss: 0.35061719417572024


Train, Epoch 14 / 20:  21%|██▏       | 336/1563 [00:09<00:34, 35.31it/s]

batch 330 loss: 0.40283129513263705


Train, Epoch 14 / 20:  22%|██▏       | 344/1563 [00:10<00:35, 34.77it/s]

batch 340 loss: 0.3434580072760582


Train, Epoch 14 / 20:  23%|██▎       | 356/1563 [00:10<00:34, 35.06it/s]

batch 350 loss: 0.3605380550026894


Train, Epoch 14 / 20:  23%|██▎       | 364/1563 [00:10<00:34, 35.13it/s]

batch 360 loss: 0.377105338871479


Train, Epoch 14 / 20:  24%|██▍       | 376/1563 [00:10<00:33, 35.72it/s]

batch 370 loss: 0.32900849133729937


Train, Epoch 14 / 20:  25%|██▍       | 384/1563 [00:11<00:33, 34.74it/s]

batch 380 loss: 0.3453423798084259


Train, Epoch 14 / 20:  25%|██▌       | 396/1563 [00:11<00:33, 35.00it/s]

batch 390 loss: 0.2662492886185646


Train, Epoch 14 / 20:  26%|██▌       | 404/1563 [00:11<00:32, 35.21it/s]

batch 400 loss: 0.35681954622268675


Train, Epoch 14 / 20:  27%|██▋       | 416/1563 [00:12<00:32, 35.42it/s]

batch 410 loss: 0.28508122712373735


Train, Epoch 14 / 20:  27%|██▋       | 424/1563 [00:12<00:31, 35.65it/s]

batch 420 loss: 0.30091702118515967


Train, Epoch 14 / 20:  28%|██▊       | 436/1563 [00:12<00:31, 35.48it/s]

batch 430 loss: 0.2785988196730614


Train, Epoch 14 / 20:  28%|██▊       | 444/1563 [00:12<00:31, 35.11it/s]

batch 440 loss: 0.2799269862473011


Train, Epoch 14 / 20:  29%|██▉       | 456/1563 [00:13<00:31, 34.97it/s]

batch 450 loss: 0.33196703642606734


Train, Epoch 14 / 20:  30%|██▉       | 464/1563 [00:13<00:31, 34.96it/s]

batch 460 loss: 0.3554013580083847


Train, Epoch 14 / 20:  30%|███       | 476/1563 [00:13<00:30, 35.33it/s]

batch 470 loss: 0.3086026109755039


Train, Epoch 14 / 20:  31%|███       | 484/1563 [00:14<00:30, 35.24it/s]

batch 480 loss: 0.3004302129149437


Train, Epoch 14 / 20:  32%|███▏      | 496/1563 [00:14<00:31, 34.37it/s]

batch 490 loss: 0.29872337356209755


Train, Epoch 14 / 20:  32%|███▏      | 504/1563 [00:14<00:32, 32.48it/s]

batch 500 loss: 0.3331807002425194


Train, Epoch 14 / 20:  33%|███▎      | 516/1563 [00:14<00:32, 31.80it/s]

batch 510 loss: 0.36481517255306245


Train, Epoch 14 / 20:  34%|███▎      | 524/1563 [00:15<00:33, 31.25it/s]

batch 520 loss: 0.37219519913196564


Train, Epoch 14 / 20:  34%|███▍      | 536/1563 [00:15<00:33, 30.60it/s]

batch 530 loss: 0.39859139919281006


Train, Epoch 14 / 20:  35%|███▍      | 544/1563 [00:15<00:32, 31.63it/s]

batch 540 loss: 0.3498071223497391


Train, Epoch 14 / 20:  36%|███▌      | 556/1563 [00:16<00:31, 31.84it/s]

batch 550 loss: 0.3028522178530693


Train, Epoch 14 / 20:  36%|███▌      | 564/1563 [00:16<00:32, 31.09it/s]

batch 560 loss: 0.31759328246116636


Train, Epoch 14 / 20:  37%|███▋      | 576/1563 [00:16<00:32, 30.49it/s]

batch 570 loss: 0.3838722497224808


Train, Epoch 14 / 20:  37%|███▋      | 584/1563 [00:17<00:30, 32.36it/s]

batch 580 loss: 0.2821755319833755


Train, Epoch 14 / 20:  38%|███▊      | 596/1563 [00:17<00:28, 34.39it/s]

batch 590 loss: 0.37440006136894227


Train, Epoch 14 / 20:  39%|███▊      | 604/1563 [00:17<00:27, 34.73it/s]

batch 600 loss: 0.3801796302199364


Train, Epoch 14 / 20:  39%|███▉      | 616/1563 [00:18<00:26, 35.33it/s]

batch 610 loss: 0.3712271124124527


Train, Epoch 14 / 20:  40%|███▉      | 624/1563 [00:18<00:26, 35.04it/s]

batch 620 loss: 0.2865396775305271


Train, Epoch 14 / 20:  41%|████      | 636/1563 [00:18<00:25, 35.92it/s]

batch 630 loss: 0.40677957981824875


Train, Epoch 14 / 20:  41%|████      | 644/1563 [00:18<00:25, 35.95it/s]

batch 640 loss: 0.2720049723982811


Train, Epoch 14 / 20:  42%|████▏     | 656/1563 [00:19<00:25, 36.28it/s]

batch 650 loss: 0.3071285620331764


Train, Epoch 14 / 20:  42%|████▏     | 664/1563 [00:19<00:25, 35.48it/s]

batch 660 loss: 0.30387231558561323


Train, Epoch 14 / 20:  43%|████▎     | 676/1563 [00:19<00:24, 36.02it/s]

batch 670 loss: 0.3648656189441681


Train, Epoch 14 / 20:  44%|████▍     | 684/1563 [00:19<00:24, 35.89it/s]

batch 680 loss: 0.43849576711654664


Train, Epoch 14 / 20:  45%|████▍     | 696/1563 [00:20<00:24, 35.86it/s]

batch 690 loss: 0.3683212101459503


Train, Epoch 14 / 20:  45%|████▌     | 704/1563 [00:20<00:24, 35.60it/s]

batch 700 loss: 0.3394386231899261


Train, Epoch 14 / 20:  46%|████▌     | 716/1563 [00:20<00:23, 35.80it/s]

batch 710 loss: 0.3280916020274162


Train, Epoch 14 / 20:  46%|████▋     | 724/1563 [00:21<00:23, 35.54it/s]

batch 720 loss: 0.3339583545923233


Train, Epoch 14 / 20:  47%|████▋     | 736/1563 [00:21<00:23, 35.81it/s]

batch 730 loss: 0.29259379357099535


Train, Epoch 14 / 20:  48%|████▊     | 744/1563 [00:21<00:23, 35.59it/s]

batch 740 loss: 0.3758934199810028


Train, Epoch 14 / 20:  48%|████▊     | 756/1563 [00:21<00:22, 36.26it/s]

batch 750 loss: 0.36954293996095655


Train, Epoch 14 / 20:  49%|████▉     | 764/1563 [00:22<00:22, 35.50it/s]

batch 760 loss: 0.3285516917705536


Train, Epoch 14 / 20:  50%|████▉     | 776/1563 [00:22<00:22, 35.41it/s]

batch 770 loss: 0.281958869099617


Train, Epoch 14 / 20:  50%|█████     | 784/1563 [00:22<00:21, 36.06it/s]

batch 780 loss: 0.3302077382802963


Train, Epoch 14 / 20:  51%|█████     | 796/1563 [00:23<00:21, 36.27it/s]

batch 790 loss: 0.37096191197633743


Train, Epoch 14 / 20:  51%|█████▏    | 804/1563 [00:23<00:20, 36.35it/s]

batch 800 loss: 0.4047666504979134


Train, Epoch 14 / 20:  52%|█████▏    | 816/1563 [00:23<00:20, 36.12it/s]

batch 810 loss: 0.3165565863251686


Train, Epoch 14 / 20:  53%|█████▎    | 824/1563 [00:23<00:20, 35.89it/s]

batch 820 loss: 0.3613091692328453


Train, Epoch 14 / 20:  53%|█████▎    | 836/1563 [00:24<00:20, 35.59it/s]

batch 830 loss: 0.27052419632673264


Train, Epoch 14 / 20:  54%|█████▍    | 844/1563 [00:24<00:20, 35.18it/s]

batch 840 loss: 0.3380914106965065


Train, Epoch 14 / 20:  55%|█████▍    | 856/1563 [00:24<00:20, 35.23it/s]

batch 850 loss: 0.38410876542329786


Train, Epoch 14 / 20:  55%|█████▌    | 864/1563 [00:25<00:19, 35.22it/s]

batch 860 loss: 0.3620461583137512


Train, Epoch 14 / 20:  56%|█████▌    | 876/1563 [00:25<00:19, 35.84it/s]

batch 870 loss: 0.2896915152668953


Train, Epoch 14 / 20:  57%|█████▋    | 884/1563 [00:25<00:19, 35.64it/s]

batch 880 loss: 0.372999906539917


Train, Epoch 14 / 20:  57%|█████▋    | 896/1563 [00:25<00:18, 35.65it/s]

batch 890 loss: 0.30715763866901397


Train, Epoch 14 / 20:  58%|█████▊    | 904/1563 [00:26<00:18, 35.75it/s]

batch 900 loss: 0.377298030257225


Train, Epoch 14 / 20:  59%|█████▊    | 916/1563 [00:26<00:18, 35.19it/s]

batch 910 loss: 0.3409592375159264


Train, Epoch 14 / 20:  59%|█████▉    | 924/1563 [00:26<00:18, 34.86it/s]

batch 920 loss: 0.3439858391880989


Train, Epoch 14 / 20:  60%|█████▉    | 936/1563 [00:27<00:18, 34.28it/s]

batch 930 loss: 0.31346673220396043


Train, Epoch 14 / 20:  60%|██████    | 944/1563 [00:27<00:19, 32.10it/s]

batch 940 loss: 0.3576545789837837


Train, Epoch 14 / 20:  61%|██████    | 956/1563 [00:27<00:19, 31.57it/s]

batch 950 loss: 0.2723551444709301


Train, Epoch 14 / 20:  62%|██████▏   | 964/1563 [00:27<00:18, 31.61it/s]

batch 960 loss: 0.385029973089695


Train, Epoch 14 / 20:  62%|██████▏   | 976/1563 [00:28<00:18, 31.57it/s]

batch 970 loss: 0.36057842820882796


Train, Epoch 14 / 20:  63%|██████▎   | 984/1563 [00:28<00:18, 31.62it/s]

batch 980 loss: 0.3754238814115524


Train, Epoch 14 / 20:  64%|██████▎   | 996/1563 [00:29<00:18, 30.54it/s]

batch 990 loss: 0.3497675538063049


Train, Epoch 14 / 20:  64%|██████▍   | 1004/1563 [00:29<00:18, 29.74it/s]

batch 1000 loss: 0.2732315227389336


Train, Epoch 14 / 20:  65%|██████▍   | 1013/1563 [00:29<00:18, 29.30it/s]

batch 1010 loss: 0.31456473618745806


Train, Epoch 14 / 20:  66%|██████▌   | 1025/1563 [00:29<00:16, 33.35it/s]

batch 1020 loss: 0.3785484775900841


Train, Epoch 14 / 20:  66%|██████▋   | 1037/1563 [00:30<00:14, 35.13it/s]

batch 1030 loss: 0.37909669280052183


Train, Epoch 14 / 20:  67%|██████▋   | 1045/1563 [00:30<00:14, 35.63it/s]

batch 1040 loss: 0.2911024659872055


Train, Epoch 14 / 20:  68%|██████▊   | 1057/1563 [00:30<00:14, 35.25it/s]

batch 1050 loss: 0.2812854401767254


Train, Epoch 14 / 20:  68%|██████▊   | 1065/1563 [00:31<00:14, 35.38it/s]

batch 1060 loss: 0.27956932187080386


Train, Epoch 14 / 20:  69%|██████▉   | 1077/1563 [00:31<00:13, 35.65it/s]

batch 1070 loss: 0.3676760897040367


Train, Epoch 14 / 20:  69%|██████▉   | 1085/1563 [00:31<00:13, 35.35it/s]

batch 1080 loss: 0.2948817238211632


Train, Epoch 14 / 20:  70%|███████   | 1097/1563 [00:31<00:13, 34.94it/s]

batch 1090 loss: 0.3499604120850563


Train, Epoch 14 / 20:  71%|███████   | 1105/1563 [00:32<00:13, 34.96it/s]

batch 1100 loss: 0.31422358751296997


Train, Epoch 14 / 20:  71%|███████▏  | 1117/1563 [00:32<00:12, 35.64it/s]

batch 1110 loss: 0.34407214969396593


Train, Epoch 14 / 20:  72%|███████▏  | 1125/1563 [00:32<00:12, 35.25it/s]

batch 1120 loss: 0.39929078221321107


Train, Epoch 14 / 20:  73%|███████▎  | 1137/1563 [00:33<00:12, 35.06it/s]

batch 1130 loss: 0.3014751821756363


Train, Epoch 14 / 20:  73%|███████▎  | 1145/1563 [00:33<00:11, 35.64it/s]

batch 1140 loss: 0.4411704882979393


Train, Epoch 14 / 20:  74%|███████▍  | 1157/1563 [00:33<00:11, 35.48it/s]

batch 1150 loss: 0.29521182328462603


Train, Epoch 14 / 20:  75%|███████▍  | 1165/1563 [00:33<00:11, 34.74it/s]

batch 1160 loss: 0.30482529401779174


Train, Epoch 14 / 20:  75%|███████▌  | 1177/1563 [00:34<00:10, 35.49it/s]

batch 1170 loss: 0.2828119032084942


Train, Epoch 14 / 20:  76%|███████▌  | 1185/1563 [00:34<00:10, 35.44it/s]

batch 1180 loss: 0.39166630804538727


Train, Epoch 14 / 20:  77%|███████▋  | 1197/1563 [00:34<00:10, 35.86it/s]

batch 1190 loss: 0.3334889754652977


Train, Epoch 14 / 20:  77%|███████▋  | 1205/1563 [00:35<00:10, 35.53it/s]

batch 1200 loss: 0.3125017613172531


Train, Epoch 14 / 20:  78%|███████▊  | 1217/1563 [00:35<00:09, 35.29it/s]

batch 1210 loss: 0.2649933785200119


Train, Epoch 14 / 20:  78%|███████▊  | 1225/1563 [00:35<00:09, 35.33it/s]

batch 1220 loss: 0.3794757291674614


Train, Epoch 14 / 20:  79%|███████▉  | 1237/1563 [00:35<00:09, 35.77it/s]

batch 1230 loss: 0.31226966977119447


Train, Epoch 14 / 20:  80%|███████▉  | 1245/1563 [00:36<00:08, 35.57it/s]

batch 1240 loss: 0.360966719686985


Train, Epoch 14 / 20:  80%|████████  | 1257/1563 [00:36<00:08, 35.31it/s]

batch 1250 loss: 0.38035961985588074


Train, Epoch 14 / 20:  81%|████████  | 1265/1563 [00:36<00:08, 35.18it/s]

batch 1260 loss: 0.2747424252331257


Train, Epoch 14 / 20:  82%|████████▏ | 1277/1563 [00:37<00:08, 35.56it/s]

batch 1270 loss: 0.35361830443143843


Train, Epoch 14 / 20:  82%|████████▏ | 1285/1563 [00:37<00:07, 35.26it/s]

batch 1280 loss: 0.3250999331474304


Train, Epoch 14 / 20:  83%|████████▎ | 1297/1563 [00:37<00:07, 35.75it/s]

batch 1290 loss: 0.3555625393986702


Train, Epoch 14 / 20:  83%|████████▎ | 1305/1563 [00:37<00:07, 34.73it/s]

batch 1300 loss: 0.25905896797776223


Train, Epoch 14 / 20:  84%|████████▍ | 1317/1563 [00:38<00:06, 35.42it/s]

batch 1310 loss: 0.26972548961639403


Train, Epoch 14 / 20:  85%|████████▍ | 1325/1563 [00:38<00:06, 35.63it/s]

batch 1320 loss: 0.334060563147068


Train, Epoch 14 / 20:  86%|████████▌ | 1337/1563 [00:38<00:06, 35.60it/s]

batch 1330 loss: 0.29251156747341156


Train, Epoch 14 / 20:  86%|████████▌ | 1345/1563 [00:39<00:06, 35.42it/s]

batch 1340 loss: 0.2863501898944378


Train, Epoch 14 / 20:  87%|████████▋ | 1357/1563 [00:39<00:05, 35.81it/s]

batch 1350 loss: 0.47429694831371305


Train, Epoch 14 / 20:  87%|████████▋ | 1365/1563 [00:39<00:05, 35.52it/s]

batch 1360 loss: 0.36164400205016134


Train, Epoch 14 / 20:  88%|████████▊ | 1373/1563 [00:39<00:05, 34.49it/s]

batch 1370 loss: 0.2800554007291794


Train, Epoch 14 / 20:  89%|████████▊ | 1385/1563 [00:40<00:05, 32.87it/s]

batch 1380 loss: 0.3110888212919235


Train, Epoch 14 / 20:  89%|████████▉ | 1393/1563 [00:40<00:05, 32.87it/s]

batch 1390 loss: 0.3438840121030807


Train, Epoch 14 / 20:  90%|████████▉ | 1405/1563 [00:40<00:04, 32.53it/s]

batch 1400 loss: 0.32301430553197863


Train, Epoch 14 / 20:  90%|█████████ | 1413/1563 [00:41<00:04, 31.16it/s]

batch 1410 loss: 0.2337585672736168


Train, Epoch 14 / 20:  91%|█████████ | 1425/1563 [00:41<00:04, 31.95it/s]

batch 1420 loss: 0.31741892248392106


Train, Epoch 14 / 20:  92%|█████████▏| 1433/1563 [00:41<00:04, 31.51it/s]

batch 1430 loss: 0.3207002617418766


Train, Epoch 14 / 20:  92%|█████████▏| 1445/1563 [00:42<00:03, 31.42it/s]

batch 1440 loss: 0.38767547607421876


Train, Epoch 14 / 20:  93%|█████████▎| 1453/1563 [00:42<00:03, 29.97it/s]

batch 1450 loss: 0.3571295544505119


Train, Epoch 14 / 20:  94%|█████████▎| 1465/1563 [00:42<00:02, 33.02it/s]

batch 1460 loss: 0.3615266099572182


Train, Epoch 14 / 20:  94%|█████████▍| 1477/1563 [00:43<00:02, 34.20it/s]

batch 1470 loss: 0.3037907026708126


Train, Epoch 14 / 20:  95%|█████████▌| 1485/1563 [00:43<00:02, 34.73it/s]

batch 1480 loss: 0.29858420938253405


Train, Epoch 14 / 20:  96%|█████████▌| 1497/1563 [00:43<00:01, 35.61it/s]

batch 1490 loss: 0.3604987137019634


Train, Epoch 14 / 20:  96%|█████████▋| 1505/1563 [00:43<00:01, 35.03it/s]

batch 1500 loss: 0.28297162503004075


Train, Epoch 14 / 20:  97%|█████████▋| 1517/1563 [00:44<00:01, 35.30it/s]

batch 1510 loss: 0.29888855665922165


Train, Epoch 14 / 20:  98%|█████████▊| 1525/1563 [00:44<00:01, 35.10it/s]

batch 1520 loss: 0.3314529202878475


Train, Epoch 14 / 20:  98%|█████████▊| 1537/1563 [00:44<00:00, 35.45it/s]

batch 1530 loss: 0.35184844881296157


Train, Epoch 14 / 20:  99%|█████████▉| 1545/1563 [00:45<00:00, 34.75it/s]

batch 1540 loss: 0.29445867240428925


Train, Epoch 14 / 20: 100%|█████████▉| 1557/1563 [00:45<00:00, 35.58it/s]

batch 1550 loss: 0.4507750734686852


Train, Epoch 14 / 20: 100%|██████████| 1563/1563 [00:45<00:00, 34.35it/s]


batch 1560 loss: 0.4254449740052223


Test, Epoch 14 / 20: 100%|██████████| 1563/1563 [00:21<00:00, 71.44it/s]


Epoch 14, loss: 0.45963652189731596, accuracy: 0.79392


Train, Epoch 15 / 20:   1%|          | 16/1563 [00:00<00:45, 34.08it/s]

batch 10 loss: 0.32622904181480405


Train, Epoch 15 / 20:   2%|▏         | 24/1563 [00:00<00:44, 34.30it/s]

batch 20 loss: 0.35242737978696825


Train, Epoch 15 / 20:   2%|▏         | 36/1563 [00:01<00:43, 35.34it/s]

batch 30 loss: 0.28005481511354446


Train, Epoch 15 / 20:   3%|▎         | 44/1563 [00:01<00:42, 35.54it/s]

batch 40 loss: 0.35427545979619024


Train, Epoch 15 / 20:   4%|▎         | 56/1563 [00:01<00:42, 35.46it/s]

batch 50 loss: 0.3373066321015358


Train, Epoch 15 / 20:   4%|▍         | 64/1563 [00:01<00:42, 35.05it/s]

batch 60 loss: 0.26075016856193545


Train, Epoch 15 / 20:   5%|▍         | 76/1563 [00:02<00:41, 35.52it/s]

batch 70 loss: 0.2963825650513172


Train, Epoch 15 / 20:   5%|▌         | 84/1563 [00:02<00:41, 35.91it/s]

batch 80 loss: 0.3601320669054985


Train, Epoch 15 / 20:   6%|▌         | 96/1563 [00:02<00:40, 36.17it/s]

batch 90 loss: 0.36458888500928877


Train, Epoch 15 / 20:   7%|▋         | 104/1563 [00:02<00:40, 35.63it/s]

batch 100 loss: 0.39818159490823746


Train, Epoch 15 / 20:   7%|▋         | 116/1563 [00:03<00:40, 35.68it/s]

batch 110 loss: 0.3127744413912296


Train, Epoch 15 / 20:   8%|▊         | 124/1563 [00:03<00:41, 34.86it/s]

batch 120 loss: 0.3391610249876976


Train, Epoch 15 / 20:   9%|▊         | 136/1563 [00:03<00:41, 34.35it/s]

batch 130 loss: 0.285434253513813


Train, Epoch 15 / 20:   9%|▉         | 144/1563 [00:04<00:40, 34.77it/s]

batch 140 loss: 0.36008553877472876


Train, Epoch 15 / 20:  10%|▉         | 156/1563 [00:04<00:40, 35.08it/s]

batch 150 loss: 0.31050964444875717


Train, Epoch 15 / 20:  10%|█         | 164/1563 [00:04<00:39, 35.19it/s]

batch 160 loss: 0.29657500684261323


Train, Epoch 15 / 20:  11%|█▏        | 176/1563 [00:05<00:39, 35.55it/s]

batch 170 loss: 0.3304910257458687


Train, Epoch 15 / 20:  12%|█▏        | 184/1563 [00:05<00:38, 35.77it/s]

batch 180 loss: 0.2820557579398155


Train, Epoch 15 / 20:  13%|█▎        | 196/1563 [00:05<00:38, 35.50it/s]

batch 190 loss: 0.3977356553077698


Train, Epoch 15 / 20:  13%|█▎        | 204/1563 [00:05<00:39, 34.82it/s]

batch 200 loss: 0.2806644752621651


Train, Epoch 15 / 20:  14%|█▍        | 216/1563 [00:06<00:38, 35.21it/s]

batch 210 loss: 0.3068711683154106


Train, Epoch 15 / 20:  14%|█▍        | 224/1563 [00:06<00:37, 35.29it/s]

batch 220 loss: 0.37828766107559203


Train, Epoch 15 / 20:  15%|█▌        | 236/1563 [00:06<00:37, 34.96it/s]

batch 230 loss: 0.29918762296438217


Train, Epoch 15 / 20:  16%|█▌        | 244/1563 [00:06<00:37, 35.09it/s]

batch 240 loss: 0.32514491602778434


Train, Epoch 15 / 20:  16%|█▋        | 256/1563 [00:07<00:37, 35.04it/s]

batch 250 loss: 0.2968792848289013


Train, Epoch 15 / 20:  17%|█▋        | 264/1563 [00:07<00:37, 34.80it/s]

batch 260 loss: 0.34875102490186694


Train, Epoch 15 / 20:  18%|█▊        | 276/1563 [00:07<00:36, 35.20it/s]

batch 270 loss: 0.3295207411050797


Train, Epoch 15 / 20:  18%|█▊        | 284/1563 [00:08<00:36, 35.21it/s]

batch 280 loss: 0.2960262790322304


Train, Epoch 15 / 20:  19%|█▉        | 296/1563 [00:08<00:35, 35.64it/s]

batch 290 loss: 0.28450320065021517


Train, Epoch 15 / 20:  19%|█▉        | 304/1563 [00:08<00:36, 34.88it/s]

batch 300 loss: 0.4418897569179535


Train, Epoch 15 / 20:  20%|██        | 316/1563 [00:09<00:34, 35.95it/s]

batch 310 loss: 0.30719574391841886


Train, Epoch 15 / 20:  21%|██        | 324/1563 [00:09<00:34, 36.03it/s]

batch 320 loss: 0.32130463123321534


Train, Epoch 15 / 20:  21%|██▏       | 336/1563 [00:09<00:34, 35.73it/s]

batch 330 loss: 0.30399429649114607


Train, Epoch 15 / 20:  22%|██▏       | 344/1563 [00:09<00:34, 35.30it/s]

batch 340 loss: 0.2877167955040932


Train, Epoch 15 / 20:  23%|██▎       | 356/1563 [00:10<00:33, 35.61it/s]

batch 350 loss: 0.3073012627661228


Train, Epoch 15 / 20:  23%|██▎       | 364/1563 [00:10<00:35, 34.16it/s]

batch 360 loss: 0.3252353101968765


Train, Epoch 15 / 20:  24%|██▍       | 372/1563 [00:10<00:37, 32.12it/s]

batch 370 loss: 0.31304530054330826


Train, Epoch 15 / 20:  25%|██▍       | 384/1563 [00:11<00:37, 31.67it/s]

batch 380 loss: 0.24326538890600205


Train, Epoch 15 / 20:  25%|██▌       | 396/1563 [00:11<00:36, 32.28it/s]

batch 390 loss: 0.32483004927635195


Train, Epoch 15 / 20:  26%|██▌       | 404/1563 [00:11<00:35, 32.68it/s]

batch 400 loss: 0.2708299070596695


Train, Epoch 15 / 20:  27%|██▋       | 416/1563 [00:11<00:34, 32.87it/s]

batch 410 loss: 0.37621896862983706


Train, Epoch 15 / 20:  27%|██▋       | 424/1563 [00:12<00:36, 31.44it/s]

batch 420 loss: 0.26523262113332746


Train, Epoch 15 / 20:  28%|██▊       | 436/1563 [00:12<00:36, 31.05it/s]

batch 430 loss: 0.27723049819469453


Train, Epoch 15 / 20:  28%|██▊       | 443/1563 [00:12<00:38, 29.08it/s]

batch 440 loss: 0.31708174124360083


Train, Epoch 15 / 20:  29%|██▉       | 455/1563 [00:13<00:33, 33.40it/s]

batch 450 loss: 0.28188260793685915


Train, Epoch 15 / 20:  30%|██▉       | 467/1563 [00:13<00:31, 34.66it/s]

batch 460 loss: 0.2319772832095623


Train, Epoch 15 / 20:  30%|███       | 475/1563 [00:13<00:31, 34.68it/s]

batch 470 loss: 0.3236156776547432


Train, Epoch 15 / 20:  31%|███       | 487/1563 [00:14<00:30, 35.22it/s]

batch 480 loss: 0.31442478746175767


Train, Epoch 15 / 20:  32%|███▏      | 495/1563 [00:14<00:30, 35.22it/s]

batch 490 loss: 0.275496232509613


Train, Epoch 15 / 20:  32%|███▏      | 507/1563 [00:14<00:30, 35.08it/s]

batch 500 loss: 0.2622386686503887


Train, Epoch 15 / 20:  33%|███▎      | 515/1563 [00:14<00:29, 35.04it/s]

batch 510 loss: 0.30731507688760756


Train, Epoch 15 / 20:  34%|███▎      | 527/1563 [00:15<00:29, 35.01it/s]

batch 520 loss: 0.34748800843954086


Train, Epoch 15 / 20:  34%|███▍      | 535/1563 [00:15<00:29, 35.24it/s]

batch 530 loss: 0.2946193441748619


Train, Epoch 15 / 20:  35%|███▍      | 547/1563 [00:15<00:29, 35.01it/s]

batch 540 loss: 0.3508910402655602


Train, Epoch 15 / 20:  36%|███▌      | 555/1563 [00:16<00:28, 34.97it/s]

batch 550 loss: 0.2574501268565655


Train, Epoch 15 / 20:  36%|███▋      | 567/1563 [00:16<00:27, 35.74it/s]

batch 560 loss: 0.36838397681713103


Train, Epoch 15 / 20:  37%|███▋      | 575/1563 [00:16<00:27, 35.63it/s]

batch 570 loss: 0.290055637806654


Train, Epoch 15 / 20:  38%|███▊      | 587/1563 [00:17<00:27, 35.76it/s]

batch 580 loss: 0.37272035628557204


Train, Epoch 15 / 20:  38%|███▊      | 595/1563 [00:17<00:27, 35.54it/s]

batch 590 loss: 0.3554502345621586


Train, Epoch 15 / 20:  39%|███▉      | 607/1563 [00:17<00:27, 35.14it/s]

batch 600 loss: 0.2520763762295246


Train, Epoch 15 / 20:  39%|███▉      | 615/1563 [00:17<00:27, 34.94it/s]

batch 610 loss: 0.374350842833519


Train, Epoch 15 / 20:  40%|████      | 627/1563 [00:18<00:26, 35.39it/s]

batch 620 loss: 0.2804622694849968


Train, Epoch 15 / 20:  41%|████      | 635/1563 [00:18<00:26, 35.45it/s]

batch 630 loss: 0.2631758600473404


Train, Epoch 15 / 20:  41%|████▏     | 647/1563 [00:18<00:25, 35.67it/s]

batch 640 loss: 0.29471432864665986


Train, Epoch 15 / 20:  42%|████▏     | 655/1563 [00:18<00:26, 34.92it/s]

batch 650 loss: 0.29471021220088006


Train, Epoch 15 / 20:  43%|████▎     | 667/1563 [00:19<00:25, 35.55it/s]

batch 660 loss: 0.31023232638835907


Train, Epoch 15 / 20:  43%|████▎     | 675/1563 [00:19<00:24, 35.58it/s]

batch 670 loss: 0.2807733714580536


Train, Epoch 15 / 20:  44%|████▍     | 687/1563 [00:19<00:24, 35.70it/s]

batch 680 loss: 0.357596043497324


Train, Epoch 15 / 20:  44%|████▍     | 695/1563 [00:20<00:24, 35.71it/s]

batch 690 loss: 0.35318550318479536


Train, Epoch 15 / 20:  45%|████▌     | 707/1563 [00:20<00:23, 36.03it/s]

batch 700 loss: 0.39452489763498305


Train, Epoch 15 / 20:  46%|████▌     | 715/1563 [00:20<00:23, 35.89it/s]

batch 710 loss: 0.34898901730775833


Train, Epoch 15 / 20:  47%|████▋     | 727/1563 [00:20<00:23, 35.01it/s]

batch 720 loss: 0.3397339940071106


Train, Epoch 15 / 20:  47%|████▋     | 735/1563 [00:21<00:23, 35.09it/s]

batch 730 loss: 0.3981193296611309


Train, Epoch 15 / 20:  48%|████▊     | 747/1563 [00:21<00:22, 35.65it/s]

batch 740 loss: 0.2854867398738861


Train, Epoch 15 / 20:  48%|████▊     | 755/1563 [00:21<00:22, 35.43it/s]

batch 750 loss: 0.28145319521427153


Train, Epoch 15 / 20:  49%|████▉     | 767/1563 [00:22<00:22, 35.30it/s]

batch 760 loss: 0.2895041972398758


Train, Epoch 15 / 20:  50%|████▉     | 775/1563 [00:22<00:22, 35.29it/s]

batch 770 loss: 0.29483295530080794


Train, Epoch 15 / 20:  50%|█████     | 787/1563 [00:22<00:21, 35.60it/s]

batch 780 loss: 0.2841433435678482


Train, Epoch 15 / 20:  51%|█████     | 795/1563 [00:22<00:21, 35.87it/s]

batch 790 loss: 0.29024672955274583


Train, Epoch 15 / 20:  51%|█████▏    | 803/1563 [00:23<00:23, 32.45it/s]

batch 800 loss: 0.3367838442325592


Train, Epoch 15 / 20:  52%|█████▏    | 815/1563 [00:23<00:23, 32.33it/s]

batch 810 loss: 0.32845587432384493


Train, Epoch 15 / 20:  53%|█████▎    | 823/1563 [00:23<00:24, 30.78it/s]

batch 820 loss: 0.32765905261039735


Train, Epoch 15 / 20:  53%|█████▎    | 835/1563 [00:24<00:23, 31.27it/s]

batch 830 loss: 0.27153263092041013


Train, Epoch 15 / 20:  54%|█████▍    | 843/1563 [00:24<00:22, 31.49it/s]

batch 840 loss: 0.33604426831007006


Train, Epoch 15 / 20:  55%|█████▍    | 855/1563 [00:24<00:22, 31.66it/s]

batch 850 loss: 0.35211421102285384


Train, Epoch 15 / 20:  55%|█████▌    | 863/1563 [00:25<00:22, 31.70it/s]

batch 860 loss: 0.37652639746665956


Train, Epoch 15 / 20:  56%|█████▌    | 875/1563 [00:25<00:21, 31.94it/s]

batch 870 loss: 0.37931919693946836


Train, Epoch 15 / 20:  56%|█████▋    | 883/1563 [00:25<00:21, 31.56it/s]

batch 880 loss: 0.386971078813076


Train, Epoch 15 / 20:  57%|█████▋    | 895/1563 [00:26<00:19, 33.89it/s]

batch 890 loss: 0.4263933226466179


Train, Epoch 15 / 20:  58%|█████▊    | 907/1563 [00:26<00:19, 34.45it/s]

batch 900 loss: 0.2914358027279377


Train, Epoch 15 / 20:  59%|█████▊    | 915/1563 [00:26<00:18, 34.52it/s]

batch 910 loss: 0.3061143457889557


Train, Epoch 15 / 20:  59%|█████▉    | 927/1563 [00:26<00:18, 34.71it/s]

batch 920 loss: 0.3431457385420799


Train, Epoch 15 / 20:  60%|█████▉    | 935/1563 [00:27<00:18, 34.63it/s]

batch 930 loss: 0.3611843168735504


Train, Epoch 15 / 20:  61%|██████    | 947/1563 [00:27<00:17, 35.39it/s]

batch 940 loss: 0.3446779727935791


Train, Epoch 15 / 20:  61%|██████    | 955/1563 [00:27<00:17, 35.26it/s]

batch 950 loss: 0.3430264890193939


Train, Epoch 15 / 20:  62%|██████▏   | 967/1563 [00:28<00:16, 35.45it/s]

batch 960 loss: 0.3164170950651169


Train, Epoch 15 / 20:  62%|██████▏   | 975/1563 [00:28<00:16, 35.12it/s]

batch 970 loss: 0.2985988467931747


Train, Epoch 15 / 20:  63%|██████▎   | 987/1563 [00:28<00:16, 35.52it/s]

batch 980 loss: 0.31785863935947417


Train, Epoch 15 / 20:  64%|██████▎   | 995/1563 [00:28<00:16, 35.46it/s]

batch 990 loss: 0.41683838963508607


Train, Epoch 15 / 20:  64%|██████▍   | 1007/1563 [00:29<00:15, 35.44it/s]

batch 1000 loss: 0.3150896854698658


Train, Epoch 15 / 20:  65%|██████▍   | 1015/1563 [00:29<00:15, 35.40it/s]

batch 1010 loss: 0.3529018446803093


Train, Epoch 15 / 20:  66%|██████▌   | 1027/1563 [00:29<00:15, 35.39it/s]

batch 1020 loss: 0.32878207564353945


Train, Epoch 15 / 20:  66%|██████▌   | 1035/1563 [00:30<00:14, 35.34it/s]

batch 1030 loss: 0.40823139399290087


Train, Epoch 15 / 20:  67%|██████▋   | 1047/1563 [00:30<00:14, 35.24it/s]

batch 1040 loss: 0.37826412469148635


Train, Epoch 15 / 20:  67%|██████▋   | 1055/1563 [00:30<00:14, 35.11it/s]

batch 1050 loss: 0.3670077160000801


Train, Epoch 15 / 20:  68%|██████▊   | 1067/1563 [00:30<00:14, 35.22it/s]

batch 1060 loss: 0.3190737202763557


Train, Epoch 15 / 20:  69%|██████▉   | 1075/1563 [00:31<00:13, 35.50it/s]

batch 1070 loss: 0.2698233649134636


Train, Epoch 15 / 20:  70%|██████▉   | 1087/1563 [00:31<00:13, 35.58it/s]

batch 1080 loss: 0.28214761465787885


Train, Epoch 15 / 20:  70%|███████   | 1095/1563 [00:31<00:13, 35.89it/s]

batch 1090 loss: 0.3364138141274452


Train, Epoch 15 / 20:  71%|███████   | 1107/1563 [00:32<00:12, 35.89it/s]

batch 1100 loss: 0.34221313893795013


Train, Epoch 15 / 20:  71%|███████▏  | 1115/1563 [00:32<00:12, 35.78it/s]

batch 1110 loss: 0.2731911100447178


Train, Epoch 15 / 20:  72%|███████▏  | 1127/1563 [00:32<00:12, 35.74it/s]

batch 1120 loss: 0.32751047909259795


Train, Epoch 15 / 20:  73%|███████▎  | 1135/1563 [00:32<00:12, 35.51it/s]

batch 1130 loss: 0.29222231581807134


Train, Epoch 15 / 20:  73%|███████▎  | 1147/1563 [00:33<00:11, 35.58it/s]

batch 1140 loss: 0.2866354860365391


Train, Epoch 15 / 20:  74%|███████▍  | 1155/1563 [00:33<00:11, 35.43it/s]

batch 1150 loss: 0.37054158598184583


Train, Epoch 15 / 20:  75%|███████▍  | 1167/1563 [00:33<00:11, 35.16it/s]

batch 1160 loss: 0.2736324846744537


Train, Epoch 15 / 20:  75%|███████▌  | 1175/1563 [00:34<00:11, 35.01it/s]

batch 1170 loss: 0.2735691897571087


Train, Epoch 15 / 20:  76%|███████▌  | 1187/1563 [00:34<00:10, 34.46it/s]

batch 1180 loss: 0.3470893524587154


Train, Epoch 15 / 20:  76%|███████▋  | 1195/1563 [00:34<00:10, 34.81it/s]

batch 1190 loss: 0.3587157502770424


Train, Epoch 15 / 20:  77%|███████▋  | 1207/1563 [00:34<00:10, 35.33it/s]

batch 1200 loss: 0.2628122255206108


Train, Epoch 15 / 20:  78%|███████▊  | 1215/1563 [00:35<00:09, 35.25it/s]

batch 1210 loss: 0.3181639134883881


Train, Epoch 15 / 20:  79%|███████▊  | 1227/1563 [00:35<00:09, 35.04it/s]

batch 1220 loss: 0.37400430589914324


Train, Epoch 15 / 20:  79%|███████▉  | 1235/1563 [00:35<00:09, 34.39it/s]

batch 1230 loss: 0.3698080822825432


Train, Epoch 15 / 20:  80%|███████▉  | 1243/1563 [00:35<00:09, 33.61it/s]

batch 1240 loss: 0.33238949328660966


Train, Epoch 15 / 20:  80%|████████  | 1255/1563 [00:36<00:09, 33.22it/s]

batch 1250 loss: 0.3632661312818527


Train, Epoch 15 / 20:  81%|████████  | 1263/1563 [00:36<00:09, 32.36it/s]

batch 1260 loss: 0.31624887585639955


Train, Epoch 15 / 20:  82%|████████▏ | 1275/1563 [00:36<00:08, 32.87it/s]

batch 1270 loss: 0.2467452347278595


Train, Epoch 15 / 20:  82%|████████▏ | 1283/1563 [00:37<00:08, 31.68it/s]

batch 1280 loss: 0.33893026411533356


Train, Epoch 15 / 20:  83%|████████▎ | 1295/1563 [00:37<00:08, 30.39it/s]

batch 1290 loss: 0.25337603986263274


Train, Epoch 15 / 20:  84%|████████▎ | 1306/1563 [00:38<00:08, 29.94it/s]

batch 1300 loss: 0.24746938794851303


Train, Epoch 15 / 20:  84%|████████▍ | 1313/1563 [00:38<00:08, 28.96it/s]

batch 1310 loss: 0.37187348753213884


Train, Epoch 15 / 20:  85%|████████▍ | 1324/1563 [00:38<00:07, 31.11it/s]

batch 1320 loss: 0.29945130571722983


Train, Epoch 15 / 20:  85%|████████▌ | 1336/1563 [00:38<00:06, 34.53it/s]

batch 1330 loss: 0.29273920990526675


Train, Epoch 15 / 20:  86%|████████▌ | 1344/1563 [00:39<00:06, 34.94it/s]

batch 1340 loss: 0.31619675904512407


Train, Epoch 15 / 20:  87%|████████▋ | 1356/1563 [00:39<00:05, 35.29it/s]

batch 1350 loss: 0.4016855686903


Train, Epoch 15 / 20:  87%|████████▋ | 1364/1563 [00:39<00:05, 34.58it/s]

batch 1360 loss: 0.30613198727369306


Train, Epoch 15 / 20:  88%|████████▊ | 1376/1563 [00:40<00:05, 34.48it/s]

batch 1370 loss: 0.3308540642261505


Train, Epoch 15 / 20:  89%|████████▊ | 1384/1563 [00:40<00:05, 34.45it/s]

batch 1380 loss: 0.2805979549884796


Train, Epoch 15 / 20:  89%|████████▉ | 1396/1563 [00:40<00:04, 34.26it/s]

batch 1390 loss: 0.3089624881744385


Train, Epoch 15 / 20:  90%|████████▉ | 1404/1563 [00:40<00:04, 34.71it/s]

batch 1400 loss: 0.23987821340560914


Train, Epoch 15 / 20:  91%|█████████ | 1416/1563 [00:41<00:04, 35.62it/s]

batch 1410 loss: 0.3670461155474186


Train, Epoch 15 / 20:  91%|█████████ | 1424/1563 [00:41<00:03, 35.41it/s]

batch 1420 loss: 0.25980252772569656


Train, Epoch 15 / 20:  92%|█████████▏| 1436/1563 [00:41<00:03, 35.25it/s]

batch 1430 loss: 0.3343790993094444


Train, Epoch 15 / 20:  92%|█████████▏| 1444/1563 [00:42<00:03, 35.33it/s]

batch 1440 loss: 0.2727215297520161


Train, Epoch 15 / 20:  93%|█████████▎| 1456/1563 [00:42<00:02, 35.81it/s]

batch 1450 loss: 0.27800520807504653


Train, Epoch 15 / 20:  94%|█████████▎| 1464/1563 [00:42<00:02, 35.52it/s]

batch 1460 loss: 0.3884549081325531


Train, Epoch 15 / 20:  94%|█████████▍| 1476/1563 [00:42<00:02, 35.32it/s]

batch 1470 loss: 0.32440915107727053


Train, Epoch 15 / 20:  95%|█████████▍| 1484/1563 [00:43<00:02, 35.25it/s]

batch 1480 loss: 0.2978617139160633


Train, Epoch 15 / 20:  96%|█████████▌| 1496/1563 [00:43<00:01, 35.03it/s]

batch 1490 loss: 0.30548629313707354


Train, Epoch 15 / 20:  96%|█████████▌| 1504/1563 [00:43<00:01, 34.61it/s]

batch 1500 loss: 0.4194875076413155


Train, Epoch 15 / 20:  97%|█████████▋| 1516/1563 [00:44<00:01, 35.78it/s]

batch 1510 loss: 0.3437606878578663


Train, Epoch 15 / 20:  98%|█████████▊| 1524/1563 [00:44<00:01, 35.67it/s]

batch 1520 loss: 0.3300244078040123


Train, Epoch 15 / 20:  98%|█████████▊| 1536/1563 [00:44<00:00, 35.88it/s]

batch 1530 loss: 0.3898913308978081


Train, Epoch 15 / 20:  99%|█████████▉| 1544/1563 [00:44<00:00, 35.66it/s]

batch 1540 loss: 0.28686990961432457


Train, Epoch 15 / 20: 100%|█████████▉| 1556/1563 [00:45<00:00, 35.88it/s]

batch 1550 loss: 0.3200028419494629


Train, Epoch 15 / 20: 100%|██████████| 1563/1563 [00:45<00:00, 34.42it/s]


batch 1560 loss: 0.32556397616863253


Test, Epoch 15 / 20: 100%|██████████| 1563/1563 [00:22<00:00, 70.32it/s]


Epoch 15, loss: 0.4899166076469421, accuracy: 0.78732


Train, Epoch 16 / 20:   1%|          | 16/1563 [00:00<00:44, 34.72it/s]

batch 10 loss: 0.2757278233766556


Train, Epoch 16 / 20:   2%|▏         | 24/1563 [00:00<00:44, 34.81it/s]

batch 20 loss: 0.2586697369813919


Train, Epoch 16 / 20:   2%|▏         | 36/1563 [00:01<00:43, 35.02it/s]

batch 30 loss: 0.30070996582508086


Train, Epoch 16 / 20:   3%|▎         | 44/1563 [00:01<00:43, 34.98it/s]

batch 40 loss: 0.2822689816355705


Train, Epoch 16 / 20:   4%|▎         | 56/1563 [00:01<00:42, 35.69it/s]

batch 50 loss: 0.3775226354598999


Train, Epoch 16 / 20:   4%|▍         | 64/1563 [00:01<00:42, 35.21it/s]

batch 60 loss: 0.33454314768314364


Train, Epoch 16 / 20:   5%|▍         | 76/1563 [00:02<00:42, 35.29it/s]

batch 70 loss: 0.32240927666425706


Train, Epoch 16 / 20:   5%|▌         | 84/1563 [00:02<00:41, 35.44it/s]

batch 80 loss: 0.3648058116436005


Train, Epoch 16 / 20:   6%|▌         | 96/1563 [00:02<00:41, 35.70it/s]

batch 90 loss: 0.3370631620287895


Train, Epoch 16 / 20:   7%|▋         | 104/1563 [00:02<00:41, 35.48it/s]

batch 100 loss: 0.269662719219923


Train, Epoch 16 / 20:   7%|▋         | 116/1563 [00:03<00:40, 35.76it/s]

batch 110 loss: 0.255764077603817


Train, Epoch 16 / 20:   8%|▊         | 124/1563 [00:03<00:40, 35.54it/s]

batch 120 loss: 0.3236223742365837


Train, Epoch 16 / 20:   9%|▊         | 136/1563 [00:03<00:40, 35.12it/s]

batch 130 loss: 0.36564997434616087


Train, Epoch 16 / 20:   9%|▉         | 144/1563 [00:04<00:40, 35.19it/s]

batch 140 loss: 0.37570628374814985


Train, Epoch 16 / 20:  10%|▉         | 156/1563 [00:04<00:40, 35.17it/s]

batch 150 loss: 0.22752168625593186


Train, Epoch 16 / 20:  10%|█         | 164/1563 [00:04<00:39, 35.37it/s]

batch 160 loss: 0.30616762340068815


Train, Epoch 16 / 20:  11%|█▏        | 176/1563 [00:04<00:38, 36.14it/s]

batch 170 loss: 0.29267627745866776


Train, Epoch 16 / 20:  12%|█▏        | 184/1563 [00:05<00:38, 35.40it/s]

batch 180 loss: 0.2539822839200497


Train, Epoch 16 / 20:  13%|█▎        | 196/1563 [00:05<00:38, 35.66it/s]

batch 190 loss: 0.353461953997612


Train, Epoch 16 / 20:  13%|█▎        | 204/1563 [00:05<00:37, 35.93it/s]

batch 200 loss: 0.26858520954847337


Train, Epoch 16 / 20:  14%|█▍        | 216/1563 [00:06<00:38, 35.05it/s]

batch 210 loss: 0.3274898201227188


Train, Epoch 16 / 20:  14%|█▍        | 224/1563 [00:06<00:39, 33.64it/s]

batch 220 loss: 0.3343049980700016


Train, Epoch 16 / 20:  15%|█▌        | 236/1563 [00:06<00:41, 32.15it/s]

batch 230 loss: 0.25130106285214426


Train, Epoch 16 / 20:  16%|█▌        | 244/1563 [00:07<00:43, 30.03it/s]

batch 240 loss: 0.3397849604487419


Train, Epoch 16 / 20:  16%|█▋        | 256/1563 [00:07<00:42, 30.90it/s]

batch 250 loss: 0.32503690868616103


Train, Epoch 16 / 20:  17%|█▋        | 264/1563 [00:07<00:41, 31.00it/s]

batch 260 loss: 0.28209031373262405


Train, Epoch 16 / 20:  18%|█▊        | 276/1563 [00:08<00:42, 30.22it/s]

batch 270 loss: 0.2679507002234459


Train, Epoch 16 / 20:  18%|█▊        | 284/1563 [00:08<00:42, 30.33it/s]

batch 280 loss: 0.29924110472202303


Train, Epoch 16 / 20:  19%|█▉        | 295/1563 [00:08<00:41, 30.70it/s]

batch 290 loss: 0.3549221888184547


Train, Epoch 16 / 20:  20%|█▉        | 307/1563 [00:09<00:38, 32.29it/s]

batch 300 loss: 0.29214377105236056


Train, Epoch 16 / 20:  20%|██        | 315/1563 [00:09<00:37, 33.64it/s]

batch 310 loss: 0.28578547686338424


Train, Epoch 16 / 20:  21%|██        | 327/1563 [00:09<00:35, 35.27it/s]

batch 320 loss: 0.3774097204208374


Train, Epoch 16 / 20:  21%|██▏       | 335/1563 [00:09<00:34, 35.10it/s]

batch 330 loss: 0.28667816892266273


Train, Epoch 16 / 20:  22%|██▏       | 347/1563 [00:10<00:34, 35.02it/s]

batch 340 loss: 0.2918887235224247


Train, Epoch 16 / 20:  23%|██▎       | 355/1563 [00:10<00:34, 35.01it/s]

batch 350 loss: 0.2600062444806099


Train, Epoch 16 / 20:  23%|██▎       | 367/1563 [00:10<00:33, 35.45it/s]

batch 360 loss: 0.2624863117933273


Train, Epoch 16 / 20:  24%|██▍       | 375/1563 [00:11<00:33, 35.30it/s]

batch 370 loss: 0.24035561680793763


Train, Epoch 16 / 20:  25%|██▍       | 387/1563 [00:11<00:33, 35.54it/s]

batch 380 loss: 0.2512440711259842


Train, Epoch 16 / 20:  25%|██▌       | 395/1563 [00:11<00:33, 35.33it/s]

batch 390 loss: 0.39559134989976885


Train, Epoch 16 / 20:  26%|██▌       | 407/1563 [00:11<00:32, 35.13it/s]

batch 400 loss: 0.23723478093743325


Train, Epoch 16 / 20:  27%|██▋       | 415/1563 [00:12<00:32, 35.41it/s]

batch 410 loss: 0.30002169832587244


Train, Epoch 16 / 20:  27%|██▋       | 427/1563 [00:12<00:31, 35.57it/s]

batch 420 loss: 0.3746666759252548


Train, Epoch 16 / 20:  28%|██▊       | 435/1563 [00:12<00:31, 35.39it/s]

batch 430 loss: 0.35208758860826495


Train, Epoch 16 / 20:  29%|██▊       | 447/1563 [00:13<00:31, 35.79it/s]

batch 440 loss: 0.27824478447437284


Train, Epoch 16 / 20:  29%|██▉       | 455/1563 [00:13<00:31, 34.81it/s]

batch 450 loss: 0.30236681178212166


Train, Epoch 16 / 20:  30%|██▉       | 467/1563 [00:13<00:31, 35.28it/s]

batch 460 loss: 0.4125147357583046


Train, Epoch 16 / 20:  30%|███       | 475/1563 [00:13<00:31, 34.68it/s]

batch 470 loss: 0.27633017748594285


Train, Epoch 16 / 20:  31%|███       | 487/1563 [00:14<00:30, 34.96it/s]

batch 480 loss: 0.32452758997678754


Train, Epoch 16 / 20:  32%|███▏      | 495/1563 [00:14<00:30, 34.63it/s]

batch 490 loss: 0.3351490870118141


Train, Epoch 16 / 20:  32%|███▏      | 507/1563 [00:14<00:29, 35.36it/s]

batch 500 loss: 0.3338132634758949


Train, Epoch 16 / 20:  33%|███▎      | 515/1563 [00:14<00:29, 35.49it/s]

batch 510 loss: 0.2707848630845547


Train, Epoch 16 / 20:  34%|███▎      | 527/1563 [00:15<00:29, 34.91it/s]

batch 520 loss: 0.2580430001020432


Train, Epoch 16 / 20:  34%|███▍      | 535/1563 [00:15<00:29, 35.41it/s]

batch 530 loss: 0.3810639470815659


Train, Epoch 16 / 20:  35%|███▍      | 547/1563 [00:15<00:29, 34.72it/s]

batch 540 loss: 0.23420976400375365


Train, Epoch 16 / 20:  36%|███▌      | 555/1563 [00:16<00:28, 34.99it/s]

batch 550 loss: 0.28118386715650556


Train, Epoch 16 / 20:  36%|███▋      | 567/1563 [00:16<00:28, 35.23it/s]

batch 560 loss: 0.3258990943431854


Train, Epoch 16 / 20:  37%|███▋      | 575/1563 [00:16<00:27, 35.53it/s]

batch 570 loss: 0.2761740393936634


Train, Epoch 16 / 20:  38%|███▊      | 587/1563 [00:17<00:27, 35.93it/s]

batch 580 loss: 0.31508595645427706


Train, Epoch 16 / 20:  38%|███▊      | 595/1563 [00:17<00:27, 35.79it/s]

batch 590 loss: 0.27413468435406685


Train, Epoch 16 / 20:  39%|███▉      | 607/1563 [00:17<00:26, 35.82it/s]

batch 600 loss: 0.26697003543376924


Train, Epoch 16 / 20:  39%|███▉      | 615/1563 [00:17<00:26, 35.51it/s]

batch 610 loss: 0.3358773782849312


Train, Epoch 16 / 20:  40%|████      | 627/1563 [00:18<00:26, 35.20it/s]

batch 620 loss: 0.24473675042390824


Train, Epoch 16 / 20:  41%|████      | 635/1563 [00:18<00:26, 34.91it/s]

batch 630 loss: 0.32017036974430085


Train, Epoch 16 / 20:  41%|████▏     | 647/1563 [00:18<00:26, 35.06it/s]

batch 640 loss: 0.29581493213772775


Train, Epoch 16 / 20:  42%|████▏     | 655/1563 [00:19<00:26, 33.88it/s]

batch 650 loss: 0.4667235463857651


Train, Epoch 16 / 20:  42%|████▏     | 663/1563 [00:19<00:27, 32.16it/s]

batch 660 loss: 0.36312819868326185


Train, Epoch 16 / 20:  43%|████▎     | 675/1563 [00:19<00:28, 31.46it/s]

batch 670 loss: 0.3070410303771496


Train, Epoch 16 / 20:  44%|████▎     | 683/1563 [00:19<00:27, 32.05it/s]

batch 680 loss: 0.28134585097432135


Train, Epoch 16 / 20:  44%|████▍     | 695/1563 [00:20<00:26, 32.58it/s]

batch 690 loss: 0.30658743977546693


Train, Epoch 16 / 20:  45%|████▍     | 703/1563 [00:20<00:26, 32.64it/s]

batch 700 loss: 0.2893937580287457


Train, Epoch 16 / 20:  46%|████▌     | 715/1563 [00:20<00:26, 31.43it/s]

batch 710 loss: 0.3517035335302353


Train, Epoch 16 / 20:  46%|████▋     | 723/1563 [00:21<00:26, 31.30it/s]

batch 720 loss: 0.35920713692903516


Train, Epoch 16 / 20:  47%|████▋     | 735/1563 [00:21<00:26, 31.37it/s]

batch 730 loss: 0.29135734438896177


Train, Epoch 16 / 20:  48%|████▊     | 747/1563 [00:21<00:24, 33.53it/s]

batch 740 loss: 0.285004161298275


Train, Epoch 16 / 20:  48%|████▊     | 755/1563 [00:22<00:23, 33.94it/s]

batch 750 loss: 0.2860813707113266


Train, Epoch 16 / 20:  49%|████▉     | 767/1563 [00:22<00:22, 34.68it/s]

batch 760 loss: 0.2657105214893818


Train, Epoch 16 / 20:  50%|████▉     | 775/1563 [00:22<00:22, 34.85it/s]

batch 770 loss: 0.3372915953397751


Train, Epoch 16 / 20:  50%|█████     | 787/1563 [00:23<00:22, 34.99it/s]

batch 780 loss: 0.33333569169044497


Train, Epoch 16 / 20:  51%|█████     | 795/1563 [00:23<00:21, 35.12it/s]

batch 790 loss: 0.3053474098443985


Train, Epoch 16 / 20:  52%|█████▏    | 807/1563 [00:23<00:21, 34.56it/s]

batch 800 loss: 0.32494801208376883


Train, Epoch 16 / 20:  52%|█████▏    | 815/1563 [00:23<00:21, 35.17it/s]

batch 810 loss: 0.2906952738761902


Train, Epoch 16 / 20:  53%|█████▎    | 827/1563 [00:24<00:20, 35.28it/s]

batch 820 loss: 0.273505862057209


Train, Epoch 16 / 20:  53%|█████▎    | 835/1563 [00:24<00:20, 35.29it/s]

batch 830 loss: 0.3237760778516531


Train, Epoch 16 / 20:  54%|█████▍    | 847/1563 [00:24<00:20, 35.25it/s]

batch 840 loss: 0.29854175448417664


Train, Epoch 16 / 20:  55%|█████▍    | 855/1563 [00:24<00:19, 35.66it/s]

batch 850 loss: 0.40292018949985503


Train, Epoch 16 / 20:  55%|█████▌    | 867/1563 [00:25<00:19, 35.66it/s]

batch 860 loss: 0.2863077610731125


Train, Epoch 16 / 20:  56%|█████▌    | 875/1563 [00:25<00:19, 35.44it/s]

batch 870 loss: 0.3299871787428856


Train, Epoch 16 / 20:  57%|█████▋    | 887/1563 [00:25<00:19, 34.89it/s]

batch 880 loss: 0.27484287694096565


Train, Epoch 16 / 20:  57%|█████▋    | 895/1563 [00:26<00:19, 34.99it/s]

batch 890 loss: 0.32359021678566935


Train, Epoch 16 / 20:  58%|█████▊    | 907/1563 [00:26<00:18, 34.87it/s]

batch 900 loss: 0.3300174415111542


Train, Epoch 16 / 20:  59%|█████▊    | 915/1563 [00:26<00:18, 34.93it/s]

batch 910 loss: 0.39593829959630966


Train, Epoch 16 / 20:  59%|█████▉    | 927/1563 [00:27<00:17, 35.47it/s]

batch 920 loss: 0.33345284312963486


Train, Epoch 16 / 20:  60%|█████▉    | 935/1563 [00:27<00:17, 35.29it/s]

batch 930 loss: 0.2897918626666069


Train, Epoch 16 / 20:  61%|██████    | 947/1563 [00:27<00:17, 35.00it/s]

batch 940 loss: 0.33060840591788293


Train, Epoch 16 / 20:  61%|██████    | 955/1563 [00:27<00:17, 34.94it/s]

batch 950 loss: 0.3263154149055481


Train, Epoch 16 / 20:  62%|██████▏   | 967/1563 [00:28<00:16, 35.38it/s]

batch 960 loss: 0.3304865211248398


Train, Epoch 16 / 20:  62%|██████▏   | 975/1563 [00:28<00:16, 35.37it/s]

batch 970 loss: 0.29468833804130556


Train, Epoch 16 / 20:  63%|██████▎   | 987/1563 [00:28<00:16, 35.05it/s]

batch 980 loss: 0.29995498061180115


Train, Epoch 16 / 20:  64%|██████▎   | 995/1563 [00:28<00:16, 34.78it/s]

batch 990 loss: 0.3232136994600296


Train, Epoch 16 / 20:  64%|██████▍   | 1007/1563 [00:29<00:15, 35.75it/s]

batch 1000 loss: 0.3293001189827919


Train, Epoch 16 / 20:  65%|██████▍   | 1015/1563 [00:29<00:15, 35.40it/s]

batch 1010 loss: 0.3451520919799805


Train, Epoch 16 / 20:  66%|██████▌   | 1027/1563 [00:29<00:15, 35.19it/s]

batch 1020 loss: 0.2965434715151787


Train, Epoch 16 / 20:  66%|██████▌   | 1035/1563 [00:30<00:14, 35.38it/s]

batch 1030 loss: 0.3342210128903389


Train, Epoch 16 / 20:  67%|██████▋   | 1043/1563 [00:30<00:15, 34.24it/s]

batch 1040 loss: 0.33536593466997144


Train, Epoch 16 / 20:  67%|██████▋   | 1055/1563 [00:30<00:14, 35.05it/s]

batch 1050 loss: 0.33705676943063734


Train, Epoch 16 / 20:  68%|██████▊   | 1067/1563 [00:31<00:14, 35.22it/s]

batch 1060 loss: 0.2880652979016304


Train, Epoch 16 / 20:  69%|██████▉   | 1075/1563 [00:31<00:13, 35.15it/s]

batch 1070 loss: 0.2626601941883564


Train, Epoch 16 / 20:  70%|██████▉   | 1087/1563 [00:31<00:13, 35.00it/s]

batch 1080 loss: 0.2875439539551735


Train, Epoch 16 / 20:  70%|███████   | 1095/1563 [00:31<00:14, 33.03it/s]

batch 1090 loss: 0.2927014917135239


Train, Epoch 16 / 20:  71%|███████   | 1103/1563 [00:32<00:14, 32.03it/s]

batch 1100 loss: 0.2750334493815899


Train, Epoch 16 / 20:  71%|███████▏  | 1115/1563 [00:32<00:13, 32.36it/s]

batch 1110 loss: 0.38623576462268827


Train, Epoch 16 / 20:  72%|███████▏  | 1123/1563 [00:32<00:13, 32.74it/s]

batch 1120 loss: 0.35394023209810255


Train, Epoch 16 / 20:  73%|███████▎  | 1135/1563 [00:33<00:12, 33.43it/s]

batch 1130 loss: 0.37612935304641726


Train, Epoch 16 / 20:  73%|███████▎  | 1143/1563 [00:33<00:13, 32.00it/s]

batch 1140 loss: 0.35400014370679855


Train, Epoch 16 / 20:  74%|███████▍  | 1155/1563 [00:33<00:12, 31.57it/s]

batch 1150 loss: 0.33601488918066025


Train, Epoch 16 / 20:  74%|███████▍  | 1163/1563 [00:34<00:13, 30.48it/s]

batch 1160 loss: 0.3253913760185242


Train, Epoch 16 / 20:  75%|███████▌  | 1175/1563 [00:34<00:12, 30.99it/s]

batch 1170 loss: 0.271127063781023


Train, Epoch 16 / 20:  76%|███████▌  | 1187/1563 [00:34<00:11, 33.63it/s]

batch 1180 loss: 0.3444099023938179


Train, Epoch 16 / 20:  76%|███████▋  | 1195/1563 [00:34<00:10, 33.86it/s]

batch 1190 loss: 0.30679875910282134


Train, Epoch 16 / 20:  77%|███████▋  | 1207/1563 [00:35<00:10, 34.72it/s]

batch 1200 loss: 0.3493185073137283


Train, Epoch 16 / 20:  78%|███████▊  | 1215/1563 [00:35<00:10, 34.60it/s]

batch 1210 loss: 0.2952365517616272


Train, Epoch 16 / 20:  79%|███████▊  | 1227/1563 [00:35<00:09, 35.34it/s]

batch 1220 loss: 0.2970627933740616


Train, Epoch 16 / 20:  79%|███████▉  | 1235/1563 [00:36<00:09, 35.54it/s]

batch 1230 loss: 0.33335325717926023


Train, Epoch 16 / 20:  80%|███████▉  | 1247/1563 [00:36<00:08, 35.84it/s]

batch 1240 loss: 0.2688782334327698


Train, Epoch 16 / 20:  80%|████████  | 1255/1563 [00:36<00:08, 35.46it/s]

batch 1250 loss: 0.27616593092679975


Train, Epoch 16 / 20:  81%|████████  | 1267/1563 [00:37<00:08, 35.00it/s]

batch 1260 loss: 0.3053437888622284


Train, Epoch 16 / 20:  82%|████████▏ | 1275/1563 [00:37<00:08, 35.13it/s]

batch 1270 loss: 0.27098951265215876


Train, Epoch 16 / 20:  82%|████████▏ | 1287/1563 [00:37<00:07, 35.39it/s]

batch 1280 loss: 0.23195479586720466


Train, Epoch 16 / 20:  83%|████████▎ | 1295/1563 [00:37<00:07, 35.00it/s]

batch 1290 loss: 0.3069599449634552


Train, Epoch 16 / 20:  84%|████████▎ | 1307/1563 [00:38<00:07, 34.52it/s]

batch 1300 loss: 0.3810114338994026


Train, Epoch 16 / 20:  84%|████████▍ | 1315/1563 [00:38<00:07, 34.66it/s]

batch 1310 loss: 0.3141918607056141


Train, Epoch 16 / 20:  85%|████████▍ | 1327/1563 [00:38<00:06, 35.24it/s]

batch 1320 loss: 0.27051804065704343


Train, Epoch 16 / 20:  85%|████████▌ | 1335/1563 [00:38<00:06, 34.96it/s]

batch 1330 loss: 0.26922028213739396


Train, Epoch 16 / 20:  86%|████████▌ | 1347/1563 [00:39<00:06, 35.05it/s]

batch 1340 loss: 0.30289975106716155


Train, Epoch 16 / 20:  87%|████████▋ | 1355/1563 [00:39<00:05, 35.36it/s]

batch 1350 loss: 0.347676033526659


Train, Epoch 16 / 20:  87%|████████▋ | 1367/1563 [00:39<00:05, 35.86it/s]

batch 1360 loss: 0.350756011903286


Train, Epoch 16 / 20:  88%|████████▊ | 1375/1563 [00:40<00:05, 34.94it/s]

batch 1370 loss: 0.3128626473248005


Train, Epoch 16 / 20:  89%|████████▊ | 1387/1563 [00:40<00:05, 35.16it/s]

batch 1380 loss: 0.2914660945534706


Train, Epoch 16 / 20:  89%|████████▉ | 1395/1563 [00:40<00:04, 35.32it/s]

batch 1390 loss: 0.2970173507928848


Train, Epoch 16 / 20:  90%|█████████ | 1407/1563 [00:41<00:04, 35.64it/s]

batch 1400 loss: 0.30180182307958603


Train, Epoch 16 / 20:  91%|█████████ | 1415/1563 [00:41<00:04, 34.60it/s]

batch 1410 loss: 0.33382650315761564


Train, Epoch 16 / 20:  91%|█████████▏| 1427/1563 [00:41<00:03, 34.37it/s]

batch 1420 loss: 0.30659561008214953


Train, Epoch 16 / 20:  92%|█████████▏| 1435/1563 [00:41<00:03, 34.61it/s]

batch 1430 loss: 0.29369412511587145


Train, Epoch 16 / 20:  93%|█████████▎| 1447/1563 [00:42<00:03, 34.45it/s]

batch 1440 loss: 0.24863608032464982


Train, Epoch 16 / 20:  93%|█████████▎| 1455/1563 [00:42<00:03, 34.98it/s]

batch 1450 loss: 0.2903270974755287


Train, Epoch 16 / 20:  94%|█████████▍| 1467/1563 [00:42<00:02, 35.57it/s]

batch 1460 loss: 0.35778004080057146


Train, Epoch 16 / 20:  94%|█████████▍| 1475/1563 [00:42<00:02, 35.62it/s]

batch 1470 loss: 0.357249940931797


Train, Epoch 16 / 20:  95%|█████████▌| 1487/1563 [00:43<00:02, 35.08it/s]

batch 1480 loss: 0.3326445326209068


Train, Epoch 16 / 20:  96%|█████████▌| 1495/1563 [00:43<00:01, 34.99it/s]

batch 1490 loss: 0.3248592182993889


Train, Epoch 16 / 20:  96%|█████████▋| 1507/1563 [00:43<00:01, 35.18it/s]

batch 1500 loss: 0.25292748510837554


Train, Epoch 16 / 20:  97%|█████████▋| 1515/1563 [00:44<00:01, 35.22it/s]

batch 1510 loss: 0.37463380843400956


Train, Epoch 16 / 20:  97%|█████████▋| 1523/1563 [00:44<00:01, 34.20it/s]

batch 1520 loss: 0.32859846875071524


Train, Epoch 16 / 20:  98%|█████████▊| 1535/1563 [00:44<00:00, 33.13it/s]

batch 1530 loss: 0.33135032653808594


Train, Epoch 16 / 20:  99%|█████████▊| 1543/1563 [00:45<00:00, 31.92it/s]

batch 1540 loss: 0.312431612610817


Train, Epoch 16 / 20:  99%|█████████▉| 1555/1563 [00:45<00:00, 31.44it/s]

batch 1550 loss: 0.31894719898700713


Train, Epoch 16 / 20: 100%|██████████| 1563/1563 [00:45<00:00, 34.23it/s]


batch 1560 loss: 0.3089890643954277


Test, Epoch 16 / 20: 100%|██████████| 1563/1563 [00:22<00:00, 71.01it/s]


Epoch 16, loss: 0.4921092313545942, accuracy: 0.79692


Train, Epoch 17 / 20:   1%|          | 16/1563 [00:00<00:44, 34.96it/s]

batch 10 loss: 0.38852394074201585


Train, Epoch 17 / 20:   2%|▏         | 24/1563 [00:00<00:43, 35.09it/s]

batch 20 loss: 0.30224204659461973


Train, Epoch 17 / 20:   2%|▏         | 36/1563 [00:01<00:42, 35.72it/s]

batch 30 loss: 0.2069300465285778


Train, Epoch 17 / 20:   3%|▎         | 44/1563 [00:01<00:42, 35.38it/s]

batch 40 loss: 0.2955475836992264


Train, Epoch 17 / 20:   4%|▎         | 56/1563 [00:01<00:42, 35.81it/s]

batch 50 loss: 0.31129642724990847


Train, Epoch 17 / 20:   4%|▍         | 64/1563 [00:01<00:41, 35.78it/s]

batch 60 loss: 0.28821205645799636


Train, Epoch 17 / 20:   5%|▍         | 76/1563 [00:02<00:41, 35.47it/s]

batch 70 loss: 0.332881797850132


Train, Epoch 17 / 20:   5%|▌         | 84/1563 [00:02<00:43, 33.78it/s]

batch 80 loss: 0.22374455258250237


Train, Epoch 17 / 20:   6%|▌         | 96/1563 [00:02<00:45, 32.22it/s]

batch 90 loss: 0.3056896388530731


Train, Epoch 17 / 20:   7%|▋         | 104/1563 [00:03<00:45, 31.78it/s]

batch 100 loss: 0.34527958035469053


Train, Epoch 17 / 20:   7%|▋         | 116/1563 [00:03<00:45, 31.58it/s]

batch 110 loss: 0.27061601132154467


Train, Epoch 17 / 20:   8%|▊         | 124/1563 [00:03<00:45, 31.93it/s]

batch 120 loss: 0.28972955569624903


Train, Epoch 17 / 20:   9%|▊         | 136/1563 [00:04<00:45, 31.35it/s]

batch 130 loss: 0.2340344212949276


Train, Epoch 17 / 20:   9%|▉         | 144/1563 [00:04<00:46, 30.47it/s]

batch 140 loss: 0.2778057962656021


Train, Epoch 17 / 20:  10%|▉         | 155/1563 [00:04<00:46, 30.02it/s]

batch 150 loss: 0.24006590843200684


Train, Epoch 17 / 20:  11%|█         | 167/1563 [00:05<00:44, 31.36it/s]

batch 160 loss: 0.28624196648597716


Train, Epoch 17 / 20:  11%|█         | 175/1563 [00:05<00:41, 33.30it/s]

batch 170 loss: 0.19130177348852156


Train, Epoch 17 / 20:  12%|█▏        | 187/1563 [00:05<00:39, 34.46it/s]

batch 180 loss: 0.25399538576602937


Train, Epoch 17 / 20:  12%|█▏        | 195/1563 [00:05<00:39, 34.92it/s]

batch 190 loss: 0.3116611361503601


Train, Epoch 17 / 20:  13%|█▎        | 207/1563 [00:06<00:38, 35.58it/s]

batch 200 loss: 0.2602213151752949


Train, Epoch 17 / 20:  14%|█▍        | 215/1563 [00:06<00:38, 35.42it/s]

batch 210 loss: 0.2226799100637436


Train, Epoch 17 / 20:  15%|█▍        | 227/1563 [00:06<00:37, 35.45it/s]

batch 220 loss: 0.2958363309502602


Train, Epoch 17 / 20:  15%|█▌        | 235/1563 [00:07<00:37, 35.47it/s]

batch 230 loss: 0.3283612564206123


Train, Epoch 17 / 20:  16%|█▌        | 247/1563 [00:07<00:36, 35.68it/s]

batch 240 loss: 0.3128806121647358


Train, Epoch 17 / 20:  16%|█▋        | 255/1563 [00:07<00:36, 35.52it/s]

batch 250 loss: 0.27030845582485197


Train, Epoch 17 / 20:  17%|█▋        | 267/1563 [00:07<00:36, 35.57it/s]

batch 260 loss: 0.3492250606417656


Train, Epoch 17 / 20:  18%|█▊        | 275/1563 [00:08<00:36, 35.04it/s]

batch 270 loss: 0.2419571451842785


Train, Epoch 17 / 20:  18%|█▊        | 287/1563 [00:08<00:36, 35.10it/s]

batch 280 loss: 0.2828860193490982


Train, Epoch 17 / 20:  19%|█▉        | 295/1563 [00:08<00:36, 35.14it/s]

batch 290 loss: 0.26305182576179503


Train, Epoch 17 / 20:  20%|█▉        | 307/1563 [00:09<00:35, 35.56it/s]

batch 300 loss: 0.3622400060296059


Train, Epoch 17 / 20:  20%|██        | 315/1563 [00:09<00:35, 35.36it/s]

batch 310 loss: 0.2838496133685112


Train, Epoch 17 / 20:  21%|██        | 327/1563 [00:09<00:34, 35.32it/s]

batch 320 loss: 0.28389261588454245


Train, Epoch 17 / 20:  21%|██▏       | 335/1563 [00:09<00:34, 35.10it/s]

batch 330 loss: 0.27251011356711385


Train, Epoch 17 / 20:  22%|██▏       | 347/1563 [00:10<00:34, 35.36it/s]

batch 340 loss: 0.2606608681380749


Train, Epoch 17 / 20:  23%|██▎       | 355/1563 [00:10<00:34, 35.31it/s]

batch 350 loss: 0.27373836636543275


Train, Epoch 17 / 20:  23%|██▎       | 367/1563 [00:10<00:34, 34.37it/s]

batch 360 loss: 0.24847670271992683


Train, Epoch 17 / 20:  24%|██▍       | 375/1563 [00:11<00:34, 34.73it/s]

batch 370 loss: 0.2568641915917397


Train, Epoch 17 / 20:  25%|██▍       | 387/1563 [00:11<00:34, 34.54it/s]

batch 380 loss: 0.29047287702560426


Train, Epoch 17 / 20:  25%|██▌       | 395/1563 [00:11<00:34, 34.28it/s]

batch 390 loss: 0.3777914494276047


Train, Epoch 17 / 20:  26%|██▌       | 407/1563 [00:11<00:33, 34.51it/s]

batch 400 loss: 0.34248785227537154


Train, Epoch 17 / 20:  27%|██▋       | 415/1563 [00:12<00:33, 34.77it/s]

batch 410 loss: 0.24593991488218309


Train, Epoch 17 / 20:  27%|██▋       | 427/1563 [00:12<00:32, 35.16it/s]

batch 420 loss: 0.35329265892505646


Train, Epoch 17 / 20:  28%|██▊       | 435/1563 [00:12<00:32, 35.12it/s]

batch 430 loss: 0.29313004910945895


Train, Epoch 17 / 20:  29%|██▊       | 447/1563 [00:13<00:31, 35.60it/s]

batch 440 loss: 0.29393636137247087


Train, Epoch 17 / 20:  29%|██▉       | 455/1563 [00:13<00:31, 35.53it/s]

batch 450 loss: 0.255084989964962


Train, Epoch 17 / 20:  30%|██▉       | 467/1563 [00:13<00:30, 35.49it/s]

batch 460 loss: 0.21784889847040176


Train, Epoch 17 / 20:  30%|███       | 475/1563 [00:13<00:30, 35.17it/s]

batch 470 loss: 0.34502519816160204


Train, Epoch 17 / 20:  31%|███       | 487/1563 [00:14<00:30, 35.32it/s]

batch 480 loss: 0.2872352346777916


Train, Epoch 17 / 20:  32%|███▏      | 495/1563 [00:14<00:30, 34.68it/s]

batch 490 loss: 0.3147937074303627


Train, Epoch 17 / 20:  32%|███▏      | 507/1563 [00:14<00:29, 35.28it/s]

batch 500 loss: 0.3161324426531792


Train, Epoch 17 / 20:  33%|███▎      | 515/1563 [00:15<00:29, 35.06it/s]

batch 510 loss: 0.3197805527597666


Train, Epoch 17 / 20:  33%|███▎      | 523/1563 [00:15<00:32, 32.34it/s]

batch 520 loss: 0.29789667427539823


Train, Epoch 17 / 20:  34%|███▍      | 535/1563 [00:15<00:31, 33.07it/s]

batch 530 loss: 0.2916948370635509


Train, Epoch 17 / 20:  35%|███▍      | 543/1563 [00:15<00:30, 33.41it/s]

batch 540 loss: 0.3077363222837448


Train, Epoch 17 / 20:  36%|███▌      | 555/1563 [00:16<00:30, 32.56it/s]

batch 550 loss: 0.24607585221529008


Train, Epoch 17 / 20:  36%|███▌      | 563/1563 [00:16<00:31, 31.30it/s]

batch 560 loss: 0.3528662949800491


Train, Epoch 17 / 20:  37%|███▋      | 575/1563 [00:16<00:32, 30.87it/s]

batch 570 loss: 0.3300434023141861


Train, Epoch 17 / 20:  37%|███▋      | 583/1563 [00:17<00:32, 30.58it/s]

batch 580 loss: 0.3633309990167618


Train, Epoch 17 / 20:  38%|███▊      | 595/1563 [00:17<00:31, 30.52it/s]

batch 590 loss: 0.3316988594830036


Train, Epoch 17 / 20:  39%|███▉      | 607/1563 [00:17<00:29, 32.29it/s]

batch 600 loss: 0.4133393749594688


Train, Epoch 17 / 20:  39%|███▉      | 615/1563 [00:18<00:28, 33.51it/s]

batch 610 loss: 0.34016205817461015


Train, Epoch 17 / 20:  40%|████      | 627/1563 [00:18<00:26, 35.12it/s]

batch 620 loss: 0.330426225066185


Train, Epoch 17 / 20:  41%|████      | 635/1563 [00:18<00:26, 35.14it/s]

batch 630 loss: 0.26512611508369444


Train, Epoch 17 / 20:  41%|████▏     | 647/1563 [00:19<00:26, 34.95it/s]

batch 640 loss: 0.2989328160881996


Train, Epoch 17 / 20:  42%|████▏     | 655/1563 [00:19<00:25, 35.08it/s]

batch 650 loss: 0.3136063516139984


Train, Epoch 17 / 20:  43%|████▎     | 667/1563 [00:19<00:25, 35.33it/s]

batch 660 loss: 0.2836023487150669


Train, Epoch 17 / 20:  43%|████▎     | 675/1563 [00:19<00:25, 35.30it/s]

batch 670 loss: 0.3158095136284828


Train, Epoch 17 / 20:  44%|████▍     | 687/1563 [00:20<00:24, 35.49it/s]

batch 680 loss: 0.28107266649603846


Train, Epoch 17 / 20:  44%|████▍     | 695/1563 [00:20<00:24, 34.82it/s]

batch 690 loss: 0.2731705136597157


Train, Epoch 17 / 20:  45%|████▌     | 707/1563 [00:20<00:24, 35.43it/s]

batch 700 loss: 0.26330714821815493


Train, Epoch 17 / 20:  46%|████▌     | 715/1563 [00:21<00:24, 35.29it/s]

batch 710 loss: 0.24594317898154258


Train, Epoch 17 / 20:  47%|████▋     | 727/1563 [00:21<00:23, 35.13it/s]

batch 720 loss: 0.31266830712556837


Train, Epoch 17 / 20:  47%|████▋     | 735/1563 [00:21<00:23, 35.30it/s]

batch 730 loss: 0.3277693912386894


Train, Epoch 17 / 20:  48%|████▊     | 747/1563 [00:21<00:23, 35.13it/s]

batch 740 loss: 0.31065298318862916


Train, Epoch 17 / 20:  48%|████▊     | 755/1563 [00:22<00:23, 34.67it/s]

batch 750 loss: 0.29417224079370496


Train, Epoch 17 / 20:  49%|████▉     | 767/1563 [00:22<00:22, 35.31it/s]

batch 760 loss: 0.27192661315202715


Train, Epoch 17 / 20:  50%|████▉     | 775/1563 [00:22<00:22, 35.07it/s]

batch 770 loss: 0.2655709385871887


Train, Epoch 17 / 20:  50%|█████     | 787/1563 [00:23<00:22, 34.92it/s]

batch 780 loss: 0.19391495138406753


Train, Epoch 17 / 20:  51%|█████     | 795/1563 [00:23<00:22, 34.61it/s]

batch 790 loss: 0.26549877002835276


Train, Epoch 17 / 20:  52%|█████▏    | 807/1563 [00:23<00:21, 35.44it/s]

batch 800 loss: 0.24461483880877494


Train, Epoch 17 / 20:  52%|█████▏    | 815/1563 [00:23<00:20, 35.67it/s]

batch 810 loss: 0.31423441767692567


Train, Epoch 17 / 20:  53%|█████▎    | 827/1563 [00:24<00:21, 34.86it/s]

batch 820 loss: 0.3272271931171417


Train, Epoch 17 / 20:  53%|█████▎    | 835/1563 [00:24<00:20, 35.10it/s]

batch 830 loss: 0.4019321590662003


Train, Epoch 17 / 20:  54%|█████▍    | 847/1563 [00:24<00:20, 35.38it/s]

batch 840 loss: 0.2671838000416756


Train, Epoch 17 / 20:  55%|█████▍    | 855/1563 [00:25<00:20, 34.81it/s]

batch 850 loss: 0.28941774517297747


Train, Epoch 17 / 20:  55%|█████▌    | 867/1563 [00:25<00:19, 35.34it/s]

batch 860 loss: 0.3602608233690262


Train, Epoch 17 / 20:  56%|█████▌    | 875/1563 [00:25<00:19, 35.12it/s]

batch 870 loss: 0.3264769479632378


Train, Epoch 17 / 20:  57%|█████▋    | 887/1563 [00:25<00:19, 34.94it/s]

batch 880 loss: 0.26002794355154035


Train, Epoch 17 / 20:  57%|█████▋    | 895/1563 [00:26<00:19, 35.10it/s]

batch 890 loss: 0.3080394729971886


Train, Epoch 17 / 20:  58%|█████▊    | 907/1563 [00:26<00:18, 34.53it/s]

batch 900 loss: 0.3730825543403625


Train, Epoch 17 / 20:  59%|█████▊    | 915/1563 [00:26<00:18, 34.95it/s]

batch 910 loss: 0.284494386613369


Train, Epoch 17 / 20:  59%|█████▉    | 927/1563 [00:27<00:18, 35.13it/s]

batch 920 loss: 0.35374650806188584


Train, Epoch 17 / 20:  60%|█████▉    | 935/1563 [00:27<00:18, 34.44it/s]

batch 930 loss: 0.25619140118360517


Train, Epoch 17 / 20:  61%|██████    | 947/1563 [00:27<00:17, 35.33it/s]

batch 940 loss: 0.3270981252193451


Train, Epoch 17 / 20:  61%|██████    | 955/1563 [00:27<00:18, 33.73it/s]

batch 950 loss: 0.3432741194963455


Train, Epoch 17 / 20:  62%|██████▏   | 963/1563 [00:28<00:18, 31.72it/s]

batch 960 loss: 0.2850187838077545


Train, Epoch 17 / 20:  62%|██████▏   | 975/1563 [00:28<00:18, 31.47it/s]

batch 970 loss: 0.2441958889365196


Train, Epoch 17 / 20:  63%|██████▎   | 983/1563 [00:28<00:18, 31.17it/s]

batch 980 loss: 0.38725768476724626


Train, Epoch 17 / 20:  64%|██████▎   | 995/1563 [00:29<00:17, 31.67it/s]

batch 990 loss: 0.37829810529947283


Train, Epoch 17 / 20:  64%|██████▍   | 1003/1563 [00:29<00:17, 31.49it/s]

batch 1000 loss: 0.2767359480261803


Train, Epoch 17 / 20:  65%|██████▍   | 1015/1563 [00:29<00:17, 32.02it/s]

batch 1010 loss: 0.3253332108259201


Train, Epoch 17 / 20:  65%|██████▌   | 1023/1563 [00:30<00:16, 32.30it/s]

batch 1020 loss: 0.2828339450061321


Train, Epoch 17 / 20:  66%|██████▌   | 1035/1563 [00:30<00:16, 31.65it/s]

batch 1030 loss: 0.3473014637827873


Train, Epoch 17 / 20:  67%|██████▋   | 1047/1563 [00:30<00:15, 33.72it/s]

batch 1040 loss: 0.31136543601751326


Train, Epoch 17 / 20:  67%|██████▋   | 1055/1563 [00:31<00:14, 34.23it/s]

batch 1050 loss: 0.25813510790467264


Train, Epoch 17 / 20:  68%|██████▊   | 1067/1563 [00:31<00:14, 34.68it/s]

batch 1060 loss: 0.3200177974998951


Train, Epoch 17 / 20:  69%|██████▉   | 1075/1563 [00:31<00:14, 34.78it/s]

batch 1070 loss: 0.20851440504193305


Train, Epoch 17 / 20:  70%|██████▉   | 1087/1563 [00:31<00:13, 35.35it/s]

batch 1080 loss: 0.3057928718626499


Train, Epoch 17 / 20:  70%|███████   | 1095/1563 [00:32<00:13, 35.17it/s]

batch 1090 loss: 0.3141613319516182


Train, Epoch 17 / 20:  71%|███████   | 1107/1563 [00:32<00:13, 34.75it/s]

batch 1100 loss: 0.34787079095840456


Train, Epoch 17 / 20:  71%|███████▏  | 1115/1563 [00:32<00:12, 34.73it/s]

batch 1110 loss: 0.2688091278076172


Train, Epoch 17 / 20:  72%|███████▏  | 1127/1563 [00:33<00:12, 35.14it/s]

batch 1120 loss: 0.30321445167064665


Train, Epoch 17 / 20:  73%|███████▎  | 1135/1563 [00:33<00:12, 35.25it/s]

batch 1130 loss: 0.22793462723493577


Train, Epoch 17 / 20:  73%|███████▎  | 1147/1563 [00:33<00:11, 35.46it/s]

batch 1140 loss: 0.26764346212148665


Train, Epoch 17 / 20:  74%|███████▍  | 1155/1563 [00:33<00:11, 35.50it/s]

batch 1150 loss: 0.3762559205293655


Train, Epoch 17 / 20:  75%|███████▍  | 1167/1563 [00:34<00:11, 35.12it/s]

batch 1160 loss: 0.35114661827683447


Train, Epoch 17 / 20:  75%|███████▌  | 1175/1563 [00:34<00:11, 35.08it/s]

batch 1170 loss: 0.3389107197523117


Train, Epoch 17 / 20:  76%|███████▌  | 1187/1563 [00:34<00:10, 35.52it/s]

batch 1180 loss: 0.2655313104391098


Train, Epoch 17 / 20:  76%|███████▋  | 1195/1563 [00:35<00:10, 35.53it/s]

batch 1190 loss: 0.2947178989648819


Train, Epoch 17 / 20:  77%|███████▋  | 1207/1563 [00:35<00:10, 34.53it/s]

batch 1200 loss: 0.3344119697809219


Train, Epoch 17 / 20:  78%|███████▊  | 1215/1563 [00:35<00:10, 34.55it/s]

batch 1210 loss: 0.2753958679735661


Train, Epoch 17 / 20:  79%|███████▊  | 1227/1563 [00:35<00:09, 35.22it/s]

batch 1220 loss: 0.22696401849389075


Train, Epoch 17 / 20:  79%|███████▉  | 1235/1563 [00:36<00:09, 35.19it/s]

batch 1230 loss: 0.3195523589849472


Train, Epoch 17 / 20:  80%|███████▉  | 1247/1563 [00:36<00:08, 35.74it/s]

batch 1240 loss: 0.31115621998906134


Train, Epoch 17 / 20:  80%|████████  | 1255/1563 [00:36<00:08, 35.17it/s]

batch 1250 loss: 0.2945029243826866


Train, Epoch 17 / 20:  81%|████████  | 1267/1563 [00:37<00:08, 35.22it/s]

batch 1260 loss: 0.26297024860978124


Train, Epoch 17 / 20:  82%|████████▏ | 1275/1563 [00:37<00:08, 34.79it/s]

batch 1270 loss: 0.2943595968186855


Train, Epoch 17 / 20:  82%|████████▏ | 1287/1563 [00:37<00:07, 34.54it/s]

batch 1280 loss: 0.2694578982889652


Train, Epoch 17 / 20:  83%|████████▎ | 1295/1563 [00:37<00:07, 34.74it/s]

batch 1290 loss: 0.2528135895729065


Train, Epoch 17 / 20:  84%|████████▎ | 1307/1563 [00:38<00:07, 35.12it/s]

batch 1300 loss: 0.2861866161227226


Train, Epoch 17 / 20:  84%|████████▍ | 1315/1563 [00:38<00:07, 35.17it/s]

batch 1310 loss: 0.28843964338302613


Train, Epoch 17 / 20:  85%|████████▍ | 1327/1563 [00:38<00:06, 34.85it/s]

batch 1320 loss: 0.2765400666743517


Train, Epoch 17 / 20:  85%|████████▌ | 1335/1563 [00:39<00:06, 34.98it/s]

batch 1330 loss: 0.4227045178413391


Train, Epoch 17 / 20:  86%|████████▌ | 1347/1563 [00:39<00:06, 35.42it/s]

batch 1340 loss: 0.35287840068340304


Train, Epoch 17 / 20:  87%|████████▋ | 1355/1563 [00:39<00:05, 35.33it/s]

batch 1350 loss: 0.321539506316185


Train, Epoch 17 / 20:  87%|████████▋ | 1367/1563 [00:39<00:05, 35.40it/s]

batch 1360 loss: 0.3671742543578148


Train, Epoch 17 / 20:  88%|████████▊ | 1375/1563 [00:40<00:05, 35.53it/s]

batch 1370 loss: 0.26787444204092026


Train, Epoch 17 / 20:  89%|████████▊ | 1387/1563 [00:40<00:04, 35.38it/s]

batch 1380 loss: 0.30008028596639635


Train, Epoch 17 / 20:  89%|████████▉ | 1395/1563 [00:40<00:05, 32.87it/s]

batch 1390 loss: 0.3410690680146217


Train, Epoch 17 / 20:  90%|████████▉ | 1403/1563 [00:41<00:04, 32.66it/s]

batch 1400 loss: 0.32207237780094145


Train, Epoch 17 / 20:  91%|█████████ | 1415/1563 [00:41<00:04, 33.75it/s]

batch 1410 loss: 0.19709827676415442


Train, Epoch 17 / 20:  91%|█████████ | 1423/1563 [00:41<00:04, 33.65it/s]

batch 1420 loss: 0.24815452322363854


Train, Epoch 17 / 20:  92%|█████████▏| 1435/1563 [00:42<00:03, 32.23it/s]

batch 1430 loss: 0.32230245471000674


Train, Epoch 17 / 20:  92%|█████████▏| 1443/1563 [00:42<00:03, 31.19it/s]

batch 1440 loss: 0.41872258484363556


Train, Epoch 17 / 20:  93%|█████████▎| 1455/1563 [00:42<00:03, 31.42it/s]

batch 1450 loss: 0.24025121405720712


Train, Epoch 17 / 20:  94%|█████████▎| 1463/1563 [00:42<00:03, 31.17it/s]

batch 1460 loss: 0.28657816126942637


Train, Epoch 17 / 20:  94%|█████████▍| 1475/1563 [00:43<00:02, 31.19it/s]

batch 1470 loss: 0.2435878686606884


Train, Epoch 17 / 20:  95%|█████████▌| 1487/1563 [00:43<00:02, 33.64it/s]

batch 1480 loss: 0.39930441454052923


Train, Epoch 17 / 20:  96%|█████████▌| 1495/1563 [00:43<00:01, 34.30it/s]

batch 1490 loss: 0.35923568308353426


Train, Epoch 17 / 20:  96%|█████████▋| 1507/1563 [00:44<00:01, 34.34it/s]

batch 1500 loss: 0.30009427964687346


Train, Epoch 17 / 20:  97%|█████████▋| 1515/1563 [00:44<00:01, 34.33it/s]

batch 1510 loss: 0.32269559502601625


Train, Epoch 17 / 20:  98%|█████████▊| 1527/1563 [00:44<00:01, 34.54it/s]

batch 1520 loss: 0.27220133543014524


Train, Epoch 17 / 20:  98%|█████████▊| 1535/1563 [00:45<00:00, 33.95it/s]

batch 1530 loss: 0.2845527619123459


Train, Epoch 17 / 20:  99%|█████████▉| 1547/1563 [00:45<00:00, 35.12it/s]

batch 1540 loss: 0.3400721549987793


Train, Epoch 17 / 20:  99%|█████████▉| 1555/1563 [00:45<00:00, 34.94it/s]

batch 1550 loss: 0.2978288508951664


Train, Epoch 17 / 20: 100%|██████████| 1563/1563 [00:45<00:00, 34.08it/s]


batch 1560 loss: 0.2002183012664318


Test, Epoch 17 / 20: 100%|██████████| 1563/1563 [00:22<00:00, 70.88it/s]


Epoch 17, loss: 0.5021372601905465, accuracy: 0.8018


Train, Epoch 18 / 20:   1%|          | 16/1563 [00:00<00:49, 31.34it/s]

batch 10 loss: 0.2913860723376274


Train, Epoch 18 / 20:   2%|▏         | 24/1563 [00:00<00:49, 31.12it/s]

batch 20 loss: 0.27822277545928953


Train, Epoch 18 / 20:   2%|▏         | 36/1563 [00:01<00:48, 31.65it/s]

batch 30 loss: 0.2758005790412426


Train, Epoch 18 / 20:   3%|▎         | 44/1563 [00:01<00:45, 33.36it/s]

batch 40 loss: 0.23572889119386672


Train, Epoch 18 / 20:   4%|▎         | 56/1563 [00:01<00:42, 35.27it/s]

batch 50 loss: 0.2940556965768337


Train, Epoch 18 / 20:   4%|▍         | 64/1563 [00:01<00:43, 34.62it/s]

batch 60 loss: 0.27443417683243754


Train, Epoch 18 / 20:   5%|▍         | 76/1563 [00:02<00:42, 34.75it/s]

batch 70 loss: 0.22326007708907128


Train, Epoch 18 / 20:   5%|▌         | 84/1563 [00:02<00:42, 34.69it/s]

batch 80 loss: 0.32621520161628725


Train, Epoch 18 / 20:   6%|▌         | 96/1563 [00:02<00:42, 34.82it/s]

batch 90 loss: 0.3314915031194687


Train, Epoch 18 / 20:   7%|▋         | 104/1563 [00:03<00:42, 34.19it/s]

batch 100 loss: 0.26671516969799997


Train, Epoch 18 / 20:   7%|▋         | 116/1563 [00:03<00:41, 34.63it/s]

batch 110 loss: 0.24206658378243445


Train, Epoch 18 / 20:   8%|▊         | 124/1563 [00:03<00:41, 34.77it/s]

batch 120 loss: 0.3145850494503975


Train, Epoch 18 / 20:   9%|▊         | 136/1563 [00:04<00:40, 34.85it/s]

batch 130 loss: 0.26657145023345946


Train, Epoch 18 / 20:   9%|▉         | 144/1563 [00:04<00:40, 34.73it/s]

batch 140 loss: 0.27741560973227025


Train, Epoch 18 / 20:  10%|▉         | 156/1563 [00:04<00:40, 34.79it/s]

batch 150 loss: 0.23259869068861008


Train, Epoch 18 / 20:  10%|█         | 164/1563 [00:04<00:40, 34.67it/s]

batch 160 loss: 0.2953597724437714


Train, Epoch 18 / 20:  11%|█▏        | 176/1563 [00:05<00:39, 34.92it/s]

batch 170 loss: 0.2493249773979187


Train, Epoch 18 / 20:  12%|█▏        | 184/1563 [00:05<00:39, 34.87it/s]

batch 180 loss: 0.34515648484230044


Train, Epoch 18 / 20:  13%|█▎        | 196/1563 [00:05<00:38, 35.34it/s]

batch 190 loss: 0.3348328500986099


Train, Epoch 18 / 20:  13%|█▎        | 204/1563 [00:05<00:38, 35.34it/s]

batch 200 loss: 0.25816903859376905


Train, Epoch 18 / 20:  14%|█▍        | 216/1563 [00:06<00:38, 35.24it/s]

batch 210 loss: 0.28673297092318534


Train, Epoch 18 / 20:  14%|█▍        | 224/1563 [00:06<00:37, 35.35it/s]

batch 220 loss: 0.2672546595335007


Train, Epoch 18 / 20:  15%|█▌        | 236/1563 [00:06<00:37, 35.17it/s]

batch 230 loss: 0.29430950060486794


Train, Epoch 18 / 20:  16%|█▌        | 244/1563 [00:07<00:38, 34.68it/s]

batch 240 loss: 0.22571019530296327


Train, Epoch 18 / 20:  16%|█▋        | 256/1563 [00:07<00:37, 34.70it/s]

batch 250 loss: 0.25528875142335894


Train, Epoch 18 / 20:  17%|█▋        | 264/1563 [00:07<00:37, 35.02it/s]

batch 260 loss: 0.3062624603509903


Train, Epoch 18 / 20:  18%|█▊        | 276/1563 [00:08<00:36, 35.34it/s]

batch 270 loss: 0.2514347925782204


Train, Epoch 18 / 20:  18%|█▊        | 284/1563 [00:08<00:36, 34.86it/s]

batch 280 loss: 0.27459786273539066


Train, Epoch 18 / 20:  19%|█▉        | 296/1563 [00:08<00:36, 35.19it/s]

batch 290 loss: 0.29785900712013247


Train, Epoch 18 / 20:  19%|█▉        | 304/1563 [00:08<00:35, 35.05it/s]

batch 300 loss: 0.23689452186226845


Train, Epoch 18 / 20:  20%|██        | 316/1563 [00:09<00:35, 34.98it/s]

batch 310 loss: 0.23312290459871293


Train, Epoch 18 / 20:  21%|██        | 324/1563 [00:09<00:36, 34.33it/s]

batch 320 loss: 0.3388196289539337


Train, Epoch 18 / 20:  21%|██▏       | 336/1563 [00:09<00:35, 34.91it/s]

batch 330 loss: 0.22571961134672164


Train, Epoch 18 / 20:  22%|██▏       | 344/1563 [00:09<00:34, 35.22it/s]

batch 340 loss: 0.1917167693376541


Train, Epoch 18 / 20:  23%|██▎       | 356/1563 [00:10<00:34, 34.93it/s]

batch 350 loss: 0.2502049274742603


Train, Epoch 18 / 20:  23%|██▎       | 364/1563 [00:10<00:34, 34.57it/s]

batch 360 loss: 0.29607356414198877


Train, Epoch 18 / 20:  24%|██▍       | 376/1563 [00:10<00:33, 35.71it/s]

batch 370 loss: 0.2364543728530407


Train, Epoch 18 / 20:  25%|██▍       | 384/1563 [00:11<00:33, 35.69it/s]

batch 380 loss: 0.24751271679997444


Train, Epoch 18 / 20:  25%|██▌       | 396/1563 [00:11<00:35, 32.76it/s]

batch 390 loss: 0.24710992276668547


Train, Epoch 18 / 20:  26%|██▌       | 404/1563 [00:11<00:36, 31.80it/s]

batch 400 loss: 0.2981063283979893


Train, Epoch 18 / 20:  27%|██▋       | 416/1563 [00:12<00:35, 32.11it/s]

batch 410 loss: 0.22210865169763566


Train, Epoch 18 / 20:  27%|██▋       | 424/1563 [00:12<00:35, 31.80it/s]

batch 420 loss: 0.22263188585639


Train, Epoch 18 / 20:  28%|██▊       | 436/1563 [00:12<00:33, 33.34it/s]

batch 430 loss: 0.33034003265202044


Train, Epoch 18 / 20:  28%|██▊       | 444/1563 [00:12<00:33, 33.59it/s]

batch 440 loss: 0.28336409777402877


Train, Epoch 18 / 20:  29%|██▉       | 452/1563 [00:13<00:34, 32.44it/s]

batch 450 loss: 0.2529593467712402


Train, Epoch 18 / 20:  30%|██▉       | 464/1563 [00:13<00:35, 30.72it/s]

batch 460 loss: 0.22436157763004302


Train, Epoch 18 / 20:  30%|███       | 476/1563 [00:14<00:35, 30.36it/s]

batch 470 loss: 0.2306718833744526


Train, Epoch 18 / 20:  31%|███       | 484/1563 [00:14<00:34, 31.11it/s]

batch 480 loss: 0.2234191782772541


Train, Epoch 18 / 20:  32%|███▏      | 496/1563 [00:14<00:32, 33.30it/s]

batch 490 loss: 0.24989304691553116


Train, Epoch 18 / 20:  32%|███▏      | 504/1563 [00:14<00:30, 34.21it/s]

batch 500 loss: 0.31739571765065194


Train, Epoch 18 / 20:  33%|███▎      | 516/1563 [00:15<00:30, 34.77it/s]

batch 510 loss: 0.28067837953567504


Train, Epoch 18 / 20:  34%|███▎      | 524/1563 [00:15<00:29, 34.72it/s]

batch 520 loss: 0.2903119415044785


Train, Epoch 18 / 20:  34%|███▍      | 536/1563 [00:15<00:29, 35.15it/s]

batch 530 loss: 0.2722626429051161


Train, Epoch 18 / 20:  35%|███▍      | 544/1563 [00:16<00:29, 35.01it/s]

batch 540 loss: 0.3149227432906628


Train, Epoch 18 / 20:  36%|███▌      | 556/1563 [00:16<00:28, 35.25it/s]

batch 550 loss: 0.31147670447826387


Train, Epoch 18 / 20:  36%|███▌      | 564/1563 [00:16<00:28, 34.98it/s]

batch 560 loss: 0.4409635901451111


Train, Epoch 18 / 20:  37%|███▋      | 576/1563 [00:16<00:27, 35.58it/s]

batch 570 loss: 0.27838204354047774


Train, Epoch 18 / 20:  37%|███▋      | 584/1563 [00:17<00:27, 35.34it/s]

batch 580 loss: 0.29115314185619356


Train, Epoch 18 / 20:  38%|███▊      | 596/1563 [00:17<00:27, 35.04it/s]

batch 590 loss: 0.2973136201500893


Train, Epoch 18 / 20:  39%|███▊      | 604/1563 [00:17<00:27, 35.26it/s]

batch 600 loss: 0.24857365041971208


Train, Epoch 18 / 20:  39%|███▉      | 616/1563 [00:18<00:26, 35.38it/s]

batch 610 loss: 0.25115970745682714


Train, Epoch 18 / 20:  40%|███▉      | 624/1563 [00:18<00:26, 35.35it/s]

batch 620 loss: 0.2893146276473999


Train, Epoch 18 / 20:  41%|████      | 636/1563 [00:18<00:26, 34.63it/s]

batch 630 loss: 0.22750824317336082


Train, Epoch 18 / 20:  41%|████      | 644/1563 [00:18<00:26, 35.01it/s]

batch 640 loss: 0.25352791100740435


Train, Epoch 18 / 20:  42%|████▏     | 656/1563 [00:19<00:25, 35.19it/s]

batch 650 loss: 0.34157266691327093


Train, Epoch 18 / 20:  42%|████▏     | 664/1563 [00:19<00:25, 34.65it/s]

batch 660 loss: 0.30206849351525306


Train, Epoch 18 / 20:  43%|████▎     | 676/1563 [00:19<00:25, 34.91it/s]

batch 670 loss: 0.3549764916300774


Train, Epoch 18 / 20:  44%|████▍     | 684/1563 [00:20<00:25, 35.07it/s]

batch 680 loss: 0.22070136815309524


Train, Epoch 18 / 20:  45%|████▍     | 696/1563 [00:20<00:24, 35.05it/s]

batch 690 loss: 0.2702889874577522


Train, Epoch 18 / 20:  45%|████▌     | 704/1563 [00:20<00:24, 34.49it/s]

batch 700 loss: 0.25456608533859254


Train, Epoch 18 / 20:  46%|████▌     | 716/1563 [00:20<00:24, 34.79it/s]

batch 710 loss: 0.24962312281131743


Train, Epoch 18 / 20:  46%|████▋     | 724/1563 [00:21<00:24, 34.55it/s]

batch 720 loss: 0.3178047761321068


Train, Epoch 18 / 20:  47%|████▋     | 736/1563 [00:21<00:23, 34.87it/s]

batch 730 loss: 0.2914267688989639


Train, Epoch 18 / 20:  48%|████▊     | 744/1563 [00:21<00:24, 34.12it/s]

batch 740 loss: 0.2937297962605953


Train, Epoch 18 / 20:  48%|████▊     | 756/1563 [00:22<00:23, 34.86it/s]

batch 750 loss: 0.2834074102342129


Train, Epoch 18 / 20:  49%|████▉     | 764/1563 [00:22<00:23, 34.69it/s]

batch 760 loss: 0.3304606586694717


Train, Epoch 18 / 20:  50%|████▉     | 776/1563 [00:22<00:22, 34.62it/s]

batch 770 loss: 0.1964146725833416


Train, Epoch 18 / 20:  50%|█████     | 784/1563 [00:22<00:22, 35.27it/s]

batch 780 loss: 0.3119204998016357


Train, Epoch 18 / 20:  51%|█████     | 796/1563 [00:23<00:21, 35.63it/s]

batch 790 loss: 0.2840342611074448


Train, Epoch 18 / 20:  51%|█████▏    | 804/1563 [00:23<00:21, 35.47it/s]

batch 800 loss: 0.2644523434340954


Train, Epoch 18 / 20:  52%|█████▏    | 816/1563 [00:23<00:21, 34.68it/s]

batch 810 loss: 0.318715500831604


Train, Epoch 18 / 20:  53%|█████▎    | 824/1563 [00:24<00:20, 35.38it/s]

batch 820 loss: 0.29190465807914734


Train, Epoch 18 / 20:  53%|█████▎    | 836/1563 [00:24<00:21, 33.52it/s]

batch 830 loss: 0.339953239262104


Train, Epoch 18 / 20:  54%|█████▍    | 844/1563 [00:24<00:22, 31.91it/s]

batch 840 loss: 0.2786516010761261


Train, Epoch 18 / 20:  55%|█████▍    | 856/1563 [00:25<00:23, 30.37it/s]

batch 850 loss: 0.3366820469498634


Train, Epoch 18 / 20:  55%|█████▌    | 864/1563 [00:25<00:22, 31.02it/s]

batch 860 loss: 0.33535833805799486


Train, Epoch 18 / 20:  56%|█████▌    | 876/1563 [00:25<00:22, 30.81it/s]

batch 870 loss: 0.30651842057704926


Train, Epoch 18 / 20:  57%|█████▋    | 884/1563 [00:25<00:21, 31.71it/s]

batch 880 loss: 0.2985904172062874


Train, Epoch 18 / 20:  57%|█████▋    | 896/1563 [00:26<00:21, 31.01it/s]

batch 890 loss: 0.28265646994113924


Train, Epoch 18 / 20:  58%|█████▊    | 904/1563 [00:26<00:21, 30.37it/s]

batch 900 loss: 0.18993717655539513


Train, Epoch 18 / 20:  59%|█████▊    | 916/1563 [00:27<00:21, 30.50it/s]

batch 910 loss: 0.277054525911808


Train, Epoch 18 / 20:  59%|█████▉    | 924/1563 [00:27<00:19, 32.68it/s]

batch 920 loss: 0.24867507964372634


Train, Epoch 18 / 20:  60%|█████▉    | 936/1563 [00:27<00:18, 34.69it/s]

batch 930 loss: 0.2953652560710907


Train, Epoch 18 / 20:  60%|██████    | 944/1563 [00:27<00:18, 34.35it/s]

batch 940 loss: 0.2577605158090591


Train, Epoch 18 / 20:  61%|██████    | 956/1563 [00:28<00:17, 34.95it/s]

batch 950 loss: 0.26426994502544404


Train, Epoch 18 / 20:  62%|██████▏   | 964/1563 [00:28<00:16, 35.49it/s]

batch 960 loss: 0.401631124317646


Train, Epoch 18 / 20:  62%|██████▏   | 976/1563 [00:28<00:16, 35.25it/s]

batch 970 loss: 0.19619911760091782


Train, Epoch 18 / 20:  63%|██████▎   | 984/1563 [00:28<00:16, 34.94it/s]

batch 980 loss: 0.2607506692409515


Train, Epoch 18 / 20:  64%|██████▎   | 996/1563 [00:29<00:16, 35.27it/s]

batch 990 loss: 0.27644719183444977


Train, Epoch 18 / 20:  64%|██████▍   | 1004/1563 [00:29<00:15, 35.26it/s]

batch 1000 loss: 0.30942430794239045


Train, Epoch 18 / 20:  65%|██████▌   | 1016/1563 [00:29<00:15, 35.40it/s]

batch 1010 loss: 0.23389726877212524


Train, Epoch 18 / 20:  66%|██████▌   | 1024/1563 [00:30<00:15, 34.86it/s]

batch 1020 loss: 0.25922557413578035


Train, Epoch 18 / 20:  66%|██████▋   | 1036/1563 [00:30<00:14, 35.15it/s]

batch 1030 loss: 0.2972749412059784


Train, Epoch 18 / 20:  67%|██████▋   | 1044/1563 [00:30<00:14, 35.08it/s]

batch 1040 loss: 0.24965689182281495


Train, Epoch 18 / 20:  68%|██████▊   | 1056/1563 [00:31<00:14, 35.01it/s]

batch 1050 loss: 0.2545530691742897


Train, Epoch 18 / 20:  68%|██████▊   | 1064/1563 [00:31<00:14, 35.15it/s]

batch 1060 loss: 0.2731126956641674


Train, Epoch 18 / 20:  69%|██████▉   | 1076/1563 [00:31<00:13, 35.03it/s]

batch 1070 loss: 0.29429412111639974


Train, Epoch 18 / 20:  69%|██████▉   | 1084/1563 [00:31<00:13, 34.62it/s]

batch 1080 loss: 0.35300830751657486


Train, Epoch 18 / 20:  70%|███████   | 1096/1563 [00:32<00:13, 34.98it/s]

batch 1090 loss: 0.34348210841417315


Train, Epoch 18 / 20:  71%|███████   | 1104/1563 [00:32<00:13, 35.15it/s]

batch 1100 loss: 0.31661276072263717


Train, Epoch 18 / 20:  71%|███████▏  | 1116/1563 [00:32<00:12, 35.88it/s]

batch 1110 loss: 0.35191602259874344


Train, Epoch 18 / 20:  72%|███████▏  | 1124/1563 [00:32<00:12, 34.57it/s]

batch 1120 loss: 0.29318795800209047


Train, Epoch 18 / 20:  73%|███████▎  | 1136/1563 [00:33<00:12, 34.72it/s]

batch 1130 loss: 0.2377959005534649


Train, Epoch 18 / 20:  73%|███████▎  | 1144/1563 [00:33<00:11, 35.07it/s]

batch 1140 loss: 0.326076440513134


Train, Epoch 18 / 20:  74%|███████▍  | 1156/1563 [00:33<00:11, 35.16it/s]

batch 1150 loss: 0.2994667649269104


Train, Epoch 18 / 20:  74%|███████▍  | 1164/1563 [00:34<00:11, 34.87it/s]

batch 1160 loss: 0.27684821784496305


Train, Epoch 18 / 20:  75%|███████▌  | 1176/1563 [00:34<00:11, 35.14it/s]

batch 1170 loss: 0.2597404092550278


Train, Epoch 18 / 20:  76%|███████▌  | 1184/1563 [00:34<00:10, 35.67it/s]

batch 1180 loss: 0.2808838650584221


Train, Epoch 18 / 20:  77%|███████▋  | 1196/1563 [00:35<00:10, 35.55it/s]

batch 1190 loss: 0.31470080465078354


Train, Epoch 18 / 20:  77%|███████▋  | 1204/1563 [00:35<00:10, 35.05it/s]

batch 1200 loss: 0.3465489663183689


Train, Epoch 18 / 20:  78%|███████▊  | 1216/1563 [00:35<00:09, 35.16it/s]

batch 1210 loss: 0.28431630358099935


Train, Epoch 18 / 20:  78%|███████▊  | 1224/1563 [00:35<00:09, 35.27it/s]

batch 1220 loss: 0.39027392864227295


Train, Epoch 18 / 20:  79%|███████▉  | 1236/1563 [00:36<00:09, 35.16it/s]

batch 1230 loss: 0.30373456999659537


Train, Epoch 18 / 20:  80%|███████▉  | 1244/1563 [00:36<00:09, 35.02it/s]

batch 1240 loss: 0.31764909625053406


Train, Epoch 18 / 20:  80%|████████  | 1256/1563 [00:36<00:08, 35.44it/s]

batch 1250 loss: 0.3280190259218216


Train, Epoch 18 / 20:  81%|████████  | 1264/1563 [00:36<00:08, 35.18it/s]

batch 1260 loss: 0.314481982588768


Train, Epoch 18 / 20:  82%|████████▏ | 1276/1563 [00:37<00:08, 32.33it/s]

batch 1270 loss: 0.29496836811304095


Train, Epoch 18 / 20:  82%|████████▏ | 1284/1563 [00:37<00:08, 33.20it/s]

batch 1280 loss: 0.2265622116625309


Train, Epoch 18 / 20:  83%|████████▎ | 1296/1563 [00:37<00:08, 32.56it/s]

batch 1290 loss: 0.2438632696866989


Train, Epoch 18 / 20:  83%|████████▎ | 1304/1563 [00:38<00:07, 33.47it/s]

batch 1300 loss: 0.20278921648859977


Train, Epoch 18 / 20:  84%|████████▍ | 1316/1563 [00:38<00:07, 33.33it/s]

batch 1310 loss: 0.25111310221254823


Train, Epoch 18 / 20:  85%|████████▍ | 1324/1563 [00:38<00:07, 32.27it/s]

batch 1320 loss: 0.25530241802334785


Train, Epoch 18 / 20:  85%|████████▌ | 1336/1563 [00:39<00:07, 31.94it/s]

batch 1330 loss: 0.3490730293095112


Train, Epoch 18 / 20:  86%|████████▌ | 1344/1563 [00:39<00:06, 32.58it/s]

batch 1340 loss: 0.21895634829998017


Train, Epoch 18 / 20:  87%|████████▋ | 1356/1563 [00:39<00:06, 31.49it/s]

batch 1350 loss: 0.35907389372587206


Train, Epoch 18 / 20:  87%|████████▋ | 1364/1563 [00:40<00:06, 30.87it/s]

batch 1360 loss: 0.32170536294579505


Train, Epoch 18 / 20:  88%|████████▊ | 1376/1563 [00:40<00:05, 32.56it/s]

batch 1370 loss: 0.3923538252711296


Train, Epoch 18 / 20:  89%|████████▊ | 1384/1563 [00:40<00:05, 34.06it/s]

batch 1380 loss: 0.2723869368433952


Train, Epoch 18 / 20:  89%|████████▉ | 1396/1563 [00:41<00:04, 34.52it/s]

batch 1390 loss: 0.2617212861776352


Train, Epoch 18 / 20:  90%|████████▉ | 1404/1563 [00:41<00:04, 34.72it/s]

batch 1400 loss: 0.27293281629681587


Train, Epoch 18 / 20:  91%|█████████ | 1416/1563 [00:41<00:04, 35.01it/s]

batch 1410 loss: 0.3257268160581589


Train, Epoch 18 / 20:  91%|█████████ | 1424/1563 [00:41<00:03, 34.98it/s]

batch 1420 loss: 0.37766765654087064


Train, Epoch 18 / 20:  92%|█████████▏| 1436/1563 [00:42<00:03, 34.94it/s]

batch 1430 loss: 0.36033026725053785


Train, Epoch 18 / 20:  92%|█████████▏| 1444/1563 [00:42<00:03, 34.56it/s]

batch 1440 loss: 0.25130273699760436


Train, Epoch 18 / 20:  93%|█████████▎| 1456/1563 [00:42<00:03, 34.82it/s]

batch 1450 loss: 0.3606366395950317


Train, Epoch 18 / 20:  94%|█████████▎| 1464/1563 [00:42<00:02, 35.12it/s]

batch 1460 loss: 0.3159749209880829


Train, Epoch 18 / 20:  94%|█████████▍| 1476/1563 [00:43<00:02, 35.34it/s]

batch 1470 loss: 0.2561620309948921


Train, Epoch 18 / 20:  95%|█████████▍| 1484/1563 [00:43<00:02, 34.00it/s]

batch 1480 loss: 0.28325964212417604


Train, Epoch 18 / 20:  96%|█████████▌| 1496/1563 [00:43<00:01, 34.23it/s]

batch 1490 loss: 0.22084134444594383


Train, Epoch 18 / 20:  96%|█████████▌| 1504/1563 [00:44<00:01, 34.50it/s]

batch 1500 loss: 0.2641832701861858


Train, Epoch 18 / 20:  97%|█████████▋| 1516/1563 [00:44<00:01, 34.78it/s]

batch 1510 loss: 0.3091324374079704


Train, Epoch 18 / 20:  98%|█████████▊| 1524/1563 [00:44<00:01, 34.81it/s]

batch 1520 loss: 0.274239020049572


Train, Epoch 18 / 20:  98%|█████████▊| 1536/1563 [00:45<00:00, 34.07it/s]

batch 1530 loss: 0.2910284370183945


Train, Epoch 18 / 20:  99%|█████████▉| 1544/1563 [00:45<00:00, 34.66it/s]

batch 1540 loss: 0.262040825933218


Train, Epoch 18 / 20: 100%|█████████▉| 1556/1563 [00:45<00:00, 34.74it/s]

batch 1550 loss: 0.2823863625526428


Train, Epoch 18 / 20: 100%|██████████| 1563/1563 [00:45<00:00, 34.09it/s]


batch 1560 loss: 0.3227430582046509


Test, Epoch 18 / 20: 100%|██████████| 1563/1563 [00:22<00:00, 68.49it/s]


Epoch 18, loss: 0.5067251449285448, accuracy: 0.79924


Train, Epoch 19 / 20:   1%|          | 16/1563 [00:00<00:44, 34.94it/s]

batch 10 loss: 0.25679640620946886


Train, Epoch 19 / 20:   2%|▏         | 24/1563 [00:00<00:44, 34.92it/s]

batch 20 loss: 0.2783501699566841


Train, Epoch 19 / 20:   2%|▏         | 36/1563 [00:01<00:43, 34.91it/s]

batch 30 loss: 0.22009869739413263


Train, Epoch 19 / 20:   3%|▎         | 44/1563 [00:01<00:43, 35.07it/s]

batch 40 loss: 0.26816307455301286


Train, Epoch 19 / 20:   4%|▎         | 56/1563 [00:01<00:42, 35.23it/s]

batch 50 loss: 0.1914783351123333


Train, Epoch 19 / 20:   4%|▍         | 64/1563 [00:01<00:42, 34.91it/s]

batch 60 loss: 0.2551088474690914


Train, Epoch 19 / 20:   5%|▍         | 76/1563 [00:02<00:42, 35.03it/s]

batch 70 loss: 0.24888014867901803


Train, Epoch 19 / 20:   5%|▌         | 84/1563 [00:02<00:42, 34.89it/s]

batch 80 loss: 0.20159289091825486


Train, Epoch 19 / 20:   6%|▌         | 96/1563 [00:02<00:42, 34.83it/s]

batch 90 loss: 0.23059890493750573


Train, Epoch 19 / 20:   7%|▋         | 104/1563 [00:02<00:41, 34.94it/s]

batch 100 loss: 0.3160913199186325


Train, Epoch 19 / 20:   7%|▋         | 116/1563 [00:03<00:41, 35.12it/s]

batch 110 loss: 0.24912127032876014


Train, Epoch 19 / 20:   8%|▊         | 124/1563 [00:03<00:40, 35.17it/s]

batch 120 loss: 0.3238163344562054


Train, Epoch 19 / 20:   9%|▊         | 136/1563 [00:03<00:40, 35.19it/s]

batch 130 loss: 0.32082364410161973


Train, Epoch 19 / 20:   9%|▉         | 144/1563 [00:04<00:40, 35.11it/s]

batch 140 loss: 0.3083286970853806


Train, Epoch 19 / 20:  10%|▉         | 156/1563 [00:04<00:40, 34.91it/s]

batch 150 loss: 0.21452443674206734


Train, Epoch 19 / 20:  10%|█         | 164/1563 [00:04<00:39, 35.01it/s]

batch 160 loss: 0.2574801817536354


Train, Epoch 19 / 20:  11%|█▏        | 176/1563 [00:05<00:39, 35.08it/s]

batch 170 loss: 0.3026771701872349


Train, Epoch 19 / 20:  12%|█▏        | 184/1563 [00:05<00:39, 34.97it/s]

batch 180 loss: 0.3151994258165359


Train, Epoch 19 / 20:  13%|█▎        | 196/1563 [00:05<00:39, 34.26it/s]

batch 190 loss: 0.19608130007982255


Train, Epoch 19 / 20:  13%|█▎        | 204/1563 [00:05<00:39, 34.20it/s]

batch 200 loss: 0.28718101382255556


Train, Epoch 19 / 20:  14%|█▍        | 216/1563 [00:06<00:39, 34.53it/s]

batch 210 loss: 0.2692670226097107


Train, Epoch 19 / 20:  14%|█▍        | 224/1563 [00:06<00:38, 34.69it/s]

batch 220 loss: 0.33612471967935564


Train, Epoch 19 / 20:  15%|█▌        | 236/1563 [00:06<00:37, 35.23it/s]

batch 230 loss: 0.2185361623764038


Train, Epoch 19 / 20:  16%|█▌        | 244/1563 [00:07<00:38, 34.15it/s]

batch 240 loss: 0.317317770421505


Train, Epoch 19 / 20:  16%|█▋        | 256/1563 [00:07<00:40, 32.01it/s]

batch 250 loss: 0.2343500442802906


Train, Epoch 19 / 20:  17%|█▋        | 264/1563 [00:07<00:40, 31.85it/s]

batch 260 loss: 0.3285322144627571


Train, Epoch 19 / 20:  18%|█▊        | 276/1563 [00:08<00:38, 33.06it/s]

batch 270 loss: 0.2751641571521759


Train, Epoch 19 / 20:  18%|█▊        | 284/1563 [00:08<00:39, 32.18it/s]

batch 280 loss: 0.2629791185259819


Train, Epoch 19 / 20:  19%|█▉        | 296/1563 [00:08<00:37, 33.35it/s]

batch 290 loss: 0.2852102376520634


Train, Epoch 19 / 20:  19%|█▉        | 304/1563 [00:08<00:38, 32.89it/s]

batch 300 loss: 0.27747093737125395


Train, Epoch 19 / 20:  20%|██        | 316/1563 [00:09<00:38, 32.41it/s]

batch 310 loss: 0.20720432624220847


Train, Epoch 19 / 20:  21%|██        | 324/1563 [00:09<00:40, 30.85it/s]

batch 320 loss: 0.3066599279642105


Train, Epoch 19 / 20:  21%|██        | 332/1563 [00:09<00:39, 30.90it/s]

batch 330 loss: 0.27878201082348825


Train, Epoch 19 / 20:  22%|██▏       | 344/1563 [00:10<00:38, 31.60it/s]

batch 340 loss: 0.3052326083183289


Train, Epoch 19 / 20:  23%|██▎       | 356/1563 [00:10<00:35, 33.68it/s]

batch 350 loss: 0.2577087201178074


Train, Epoch 19 / 20:  23%|██▎       | 364/1563 [00:10<00:34, 34.71it/s]

batch 360 loss: 0.3131575908511877


Train, Epoch 19 / 20:  24%|██▍       | 376/1563 [00:11<00:34, 34.32it/s]

batch 370 loss: 0.22487040013074874


Train, Epoch 19 / 20:  25%|██▍       | 384/1563 [00:11<00:34, 34.20it/s]

batch 380 loss: 0.3095636427402496


Train, Epoch 19 / 20:  25%|██▌       | 396/1563 [00:11<00:33, 34.47it/s]

batch 390 loss: 0.19347461089491844


Train, Epoch 19 / 20:  26%|██▌       | 404/1563 [00:11<00:33, 34.83it/s]

batch 400 loss: 0.2537045940756798


Train, Epoch 19 / 20:  27%|██▋       | 416/1563 [00:12<00:32, 35.05it/s]

batch 410 loss: 0.30433969497680663


Train, Epoch 19 / 20:  27%|██▋       | 424/1563 [00:12<00:32, 35.15it/s]

batch 420 loss: 0.21645721569657325


Train, Epoch 19 / 20:  28%|██▊       | 436/1563 [00:12<00:32, 35.04it/s]

batch 430 loss: 0.21428203135728835


Train, Epoch 19 / 20:  28%|██▊       | 444/1563 [00:13<00:31, 35.16it/s]

batch 440 loss: 0.25517603084445


Train, Epoch 19 / 20:  29%|██▉       | 456/1563 [00:13<00:31, 35.42it/s]

batch 450 loss: 0.2789190858602524


Train, Epoch 19 / 20:  30%|██▉       | 464/1563 [00:13<00:31, 34.66it/s]

batch 460 loss: 0.33177109211683276


Train, Epoch 19 / 20:  30%|███       | 476/1563 [00:13<00:30, 35.48it/s]

batch 470 loss: 0.24402515813708306


Train, Epoch 19 / 20:  31%|███       | 484/1563 [00:14<00:30, 34.83it/s]

batch 480 loss: 0.31671746671199796


Train, Epoch 19 / 20:  32%|███▏      | 496/1563 [00:14<00:30, 34.88it/s]

batch 490 loss: 0.29685475006699563


Train, Epoch 19 / 20:  32%|███▏      | 504/1563 [00:14<00:30, 34.96it/s]

batch 500 loss: 0.24949614629149436


Train, Epoch 19 / 20:  33%|███▎      | 516/1563 [00:15<00:29, 35.00it/s]

batch 510 loss: 0.24016788601875305


Train, Epoch 19 / 20:  34%|███▎      | 524/1563 [00:15<00:29, 34.96it/s]

batch 520 loss: 0.32406034916639326


Train, Epoch 19 / 20:  34%|███▍      | 536/1563 [00:15<00:29, 35.27it/s]

batch 530 loss: 0.23293919190764428


Train, Epoch 19 / 20:  35%|███▍      | 544/1563 [00:15<00:28, 35.39it/s]

batch 540 loss: 0.2697943300008774


Train, Epoch 19 / 20:  36%|███▌      | 556/1563 [00:16<00:28, 35.16it/s]

batch 550 loss: 0.2576124109327793


Train, Epoch 19 / 20:  36%|███▌      | 564/1563 [00:16<00:28, 34.76it/s]

batch 560 loss: 0.3020822688937187


Train, Epoch 19 / 20:  37%|███▋      | 576/1563 [00:16<00:28, 35.01it/s]

batch 570 loss: 0.33598182573914526


Train, Epoch 19 / 20:  37%|███▋      | 584/1563 [00:17<00:27, 34.99it/s]

batch 580 loss: 0.3811366893351078


Train, Epoch 19 / 20:  38%|███▊      | 596/1563 [00:17<00:27, 34.94it/s]

batch 590 loss: 0.32156978249549867


Train, Epoch 19 / 20:  39%|███▊      | 604/1563 [00:17<00:27, 34.97it/s]

batch 600 loss: 0.3102601572871208


Train, Epoch 19 / 20:  39%|███▉      | 616/1563 [00:17<00:26, 35.15it/s]

batch 610 loss: 0.25337583422660825


Train, Epoch 19 / 20:  40%|███▉      | 624/1563 [00:18<00:26, 35.31it/s]

batch 620 loss: 0.23956810757517816


Train, Epoch 19 / 20:  41%|████      | 636/1563 [00:18<00:26, 34.69it/s]

batch 630 loss: 0.2786961607635021


Train, Epoch 19 / 20:  41%|████      | 644/1563 [00:18<00:26, 34.93it/s]

batch 640 loss: 0.2991203047335148


Train, Epoch 19 / 20:  42%|████▏     | 656/1563 [00:19<00:25, 34.97it/s]

batch 650 loss: 0.2024437114596367


Train, Epoch 19 / 20:  42%|████▏     | 664/1563 [00:19<00:25, 34.88it/s]

batch 660 loss: 0.23408055454492568


Train, Epoch 19 / 20:  43%|████▎     | 676/1563 [00:19<00:25, 35.00it/s]

batch 670 loss: 0.247091081738472


Train, Epoch 19 / 20:  44%|████▍     | 684/1563 [00:19<00:24, 35.42it/s]

batch 680 loss: 0.22639297433197497


Train, Epoch 19 / 20:  45%|████▍     | 696/1563 [00:20<00:25, 33.35it/s]

batch 690 loss: 0.281131649017334


Train, Epoch 19 / 20:  45%|████▌     | 704/1563 [00:20<00:27, 30.90it/s]

batch 700 loss: 0.30389371514320374


Train, Epoch 19 / 20:  46%|████▌     | 716/1563 [00:20<00:26, 32.29it/s]

batch 710 loss: 0.268491718173027


Train, Epoch 19 / 20:  46%|████▋     | 724/1563 [00:21<00:25, 32.45it/s]

batch 720 loss: 0.25629593171179293


Train, Epoch 19 / 20:  47%|████▋     | 736/1563 [00:21<00:25, 32.81it/s]

batch 730 loss: 0.28994191288948057


Train, Epoch 19 / 20:  48%|████▊     | 744/1563 [00:21<00:24, 33.59it/s]

batch 740 loss: 0.296385445818305


Train, Epoch 19 / 20:  48%|████▊     | 756/1563 [00:22<00:25, 31.92it/s]

batch 750 loss: 0.2408956430852413


Train, Epoch 19 / 20:  49%|████▉     | 764/1563 [00:22<00:26, 30.53it/s]

batch 760 loss: 0.2280917778611183


Train, Epoch 19 / 20:  50%|████▉     | 776/1563 [00:22<00:25, 31.21it/s]

batch 770 loss: 0.28892737329006196


Train, Epoch 19 / 20:  50%|█████     | 784/1563 [00:23<00:25, 30.07it/s]

batch 780 loss: 0.23302064090967178


Train, Epoch 19 / 20:  51%|█████     | 796/1563 [00:23<00:22, 33.53it/s]

batch 790 loss: 0.20261310636997223


Train, Epoch 19 / 20:  51%|█████▏    | 804/1563 [00:23<00:22, 33.95it/s]

batch 800 loss: 0.28489978685975076


Train, Epoch 19 / 20:  52%|█████▏    | 816/1563 [00:24<00:21, 34.39it/s]

batch 810 loss: 0.3283218629658222


Train, Epoch 19 / 20:  53%|█████▎    | 824/1563 [00:24<00:21, 34.80it/s]

batch 820 loss: 0.2814443469047546


Train, Epoch 19 / 20:  53%|█████▎    | 836/1563 [00:24<00:20, 34.98it/s]

batch 830 loss: 0.2689217463135719


Train, Epoch 19 / 20:  54%|█████▍    | 844/1563 [00:24<00:20, 34.67it/s]

batch 840 loss: 0.23336454406380652


Train, Epoch 19 / 20:  55%|█████▍    | 856/1563 [00:25<00:20, 34.56it/s]

batch 850 loss: 0.32157219350337984


Train, Epoch 19 / 20:  55%|█████▌    | 864/1563 [00:25<00:19, 35.03it/s]

batch 860 loss: 0.22000853344798088


Train, Epoch 19 / 20:  56%|█████▌    | 876/1563 [00:25<00:19, 34.77it/s]

batch 870 loss: 0.39201660007238387


Train, Epoch 19 / 20:  57%|█████▋    | 884/1563 [00:25<00:19, 34.95it/s]

batch 880 loss: 0.2709911338984966


Train, Epoch 19 / 20:  57%|█████▋    | 896/1563 [00:26<00:19, 34.85it/s]

batch 890 loss: 0.2539346069097519


Train, Epoch 19 / 20:  58%|█████▊    | 904/1563 [00:26<00:18, 35.01it/s]

batch 900 loss: 0.3199883759021759


Train, Epoch 19 / 20:  59%|█████▊    | 916/1563 [00:26<00:18, 35.14it/s]

batch 910 loss: 0.3209636628627777


Train, Epoch 19 / 20:  59%|█████▉    | 924/1563 [00:27<00:18, 34.99it/s]

batch 920 loss: 0.30819712579250336


Train, Epoch 19 / 20:  60%|█████▉    | 936/1563 [00:27<00:17, 35.15it/s]

batch 930 loss: 0.2930463865399361


Train, Epoch 19 / 20:  60%|██████    | 944/1563 [00:27<00:18, 34.34it/s]

batch 940 loss: 0.28937812000513075


Train, Epoch 19 / 20:  61%|██████    | 956/1563 [00:28<00:17, 34.94it/s]

batch 950 loss: 0.24674935787916183


Train, Epoch 19 / 20:  62%|██████▏   | 964/1563 [00:28<00:17, 35.18it/s]

batch 960 loss: 0.29047550112009046


Train, Epoch 19 / 20:  62%|██████▏   | 976/1563 [00:28<00:16, 35.22it/s]

batch 970 loss: 0.2597642563283443


Train, Epoch 19 / 20:  63%|██████▎   | 984/1563 [00:28<00:16, 34.61it/s]

batch 980 loss: 0.26629878729581835


Train, Epoch 19 / 20:  64%|██████▎   | 996/1563 [00:29<00:16, 34.49it/s]

batch 990 loss: 0.2674160420894623


Train, Epoch 19 / 20:  64%|██████▍   | 1004/1563 [00:29<00:16, 34.74it/s]

batch 1000 loss: 0.22399636283516883


Train, Epoch 19 / 20:  65%|██████▌   | 1016/1563 [00:29<00:15, 34.38it/s]

batch 1010 loss: 0.3964705467224121


Train, Epoch 19 / 20:  66%|██████▌   | 1024/1563 [00:30<00:15, 34.71it/s]

batch 1020 loss: 0.26911973506212233


Train, Epoch 19 / 20:  66%|██████▋   | 1036/1563 [00:30<00:14, 35.21it/s]

batch 1030 loss: 0.2980181336402893


Train, Epoch 19 / 20:  67%|██████▋   | 1044/1563 [00:30<00:14, 35.29it/s]

batch 1040 loss: 0.2456011563539505


Train, Epoch 19 / 20:  68%|██████▊   | 1056/1563 [00:30<00:14, 35.06it/s]

batch 1050 loss: 0.23094588220119477


Train, Epoch 19 / 20:  68%|██████▊   | 1064/1563 [00:31<00:14, 34.85it/s]

batch 1060 loss: 0.22379098534584047


Train, Epoch 19 / 20:  69%|██████▉   | 1076/1563 [00:31<00:14, 34.46it/s]

batch 1070 loss: 0.2839783027768135


Train, Epoch 19 / 20:  69%|██████▉   | 1084/1563 [00:31<00:13, 34.38it/s]

batch 1080 loss: 0.33619084805250166


Train, Epoch 19 / 20:  70%|███████   | 1096/1563 [00:32<00:13, 34.52it/s]

batch 1090 loss: 0.31088785231113436


Train, Epoch 19 / 20:  71%|███████   | 1104/1563 [00:32<00:13, 34.73it/s]

batch 1100 loss: 0.21485889181494713


Train, Epoch 19 / 20:  71%|███████▏  | 1116/1563 [00:32<00:12, 34.81it/s]

batch 1110 loss: 0.2569483995437622


Train, Epoch 19 / 20:  72%|███████▏  | 1124/1563 [00:32<00:12, 34.35it/s]

batch 1120 loss: 0.23832281827926635


Train, Epoch 19 / 20:  73%|███████▎  | 1136/1563 [00:33<00:13, 32.10it/s]

batch 1130 loss: 0.27377943247556685


Train, Epoch 19 / 20:  73%|███████▎  | 1144/1563 [00:33<00:13, 31.47it/s]

batch 1140 loss: 0.3100231148302555


Train, Epoch 19 / 20:  74%|███████▍  | 1156/1563 [00:33<00:13, 31.25it/s]

batch 1150 loss: 0.22998880967497826


Train, Epoch 19 / 20:  74%|███████▍  | 1163/1563 [00:34<00:13, 29.21it/s]

batch 1160 loss: 0.28066563457250593


Train, Epoch 19 / 20:  75%|███████▌  | 1175/1563 [00:34<00:12, 30.83it/s]

batch 1170 loss: 0.2247456394135952


Train, Epoch 19 / 20:  76%|███████▌  | 1183/1563 [00:34<00:12, 31.51it/s]

batch 1180 loss: 0.2790307253599167


Train, Epoch 19 / 20:  76%|███████▋  | 1195/1563 [00:35<00:12, 30.15it/s]

batch 1190 loss: 0.2743070349097252


Train, Epoch 19 / 20:  77%|███████▋  | 1203/1563 [00:35<00:12, 29.97it/s]

batch 1200 loss: 0.2337845265865326


Train, Epoch 19 / 20:  78%|███████▊  | 1215/1563 [00:35<00:11, 29.67it/s]

batch 1210 loss: 0.27779787182807925


Train, Epoch 19 / 20:  78%|███████▊  | 1225/1563 [00:36<00:11, 29.92it/s]

batch 1220 loss: 0.19454186838120222


Train, Epoch 19 / 20:  79%|███████▉  | 1233/1563 [00:36<00:10, 31.86it/s]

batch 1230 loss: 0.26973599195480347


Train, Epoch 19 / 20:  80%|███████▉  | 1245/1563 [00:36<00:09, 33.60it/s]

batch 1240 loss: 0.27698518633842467


Train, Epoch 19 / 20:  80%|████████  | 1257/1563 [00:37<00:08, 34.56it/s]

batch 1250 loss: 0.2896160513162613


Train, Epoch 19 / 20:  81%|████████  | 1265/1563 [00:37<00:08, 34.76it/s]

batch 1260 loss: 0.3002267010509968


Train, Epoch 19 / 20:  82%|████████▏ | 1277/1563 [00:37<00:08, 34.99it/s]

batch 1270 loss: 0.25692338943481446


Train, Epoch 19 / 20:  82%|████████▏ | 1285/1563 [00:37<00:07, 35.28it/s]

batch 1280 loss: 0.22955768704414367


Train, Epoch 19 / 20:  83%|████████▎ | 1297/1563 [00:38<00:07, 35.31it/s]

batch 1290 loss: 0.22337117344141005


Train, Epoch 19 / 20:  83%|████████▎ | 1305/1563 [00:38<00:07, 34.25it/s]

batch 1300 loss: 0.26701526790857316


Train, Epoch 19 / 20:  84%|████████▍ | 1317/1563 [00:38<00:07, 34.81it/s]

batch 1310 loss: 0.3601976439356804


Train, Epoch 19 / 20:  85%|████████▍ | 1325/1563 [00:39<00:06, 34.17it/s]

batch 1320 loss: 0.2151126891374588


Train, Epoch 19 / 20:  86%|████████▌ | 1337/1563 [00:39<00:06, 34.99it/s]

batch 1330 loss: 0.3325770005583763


Train, Epoch 19 / 20:  86%|████████▌ | 1345/1563 [00:39<00:06, 34.59it/s]

batch 1340 loss: 0.284940704703331


Train, Epoch 19 / 20:  87%|████████▋ | 1357/1563 [00:40<00:05, 34.54it/s]

batch 1350 loss: 0.2757699631154537


Train, Epoch 19 / 20:  87%|████████▋ | 1365/1563 [00:40<00:05, 33.98it/s]

batch 1360 loss: 0.2941825598478317


Train, Epoch 19 / 20:  88%|████████▊ | 1377/1563 [00:40<00:05, 34.96it/s]

batch 1370 loss: 0.32950182631611824


Train, Epoch 19 / 20:  89%|████████▊ | 1385/1563 [00:40<00:05, 34.91it/s]

batch 1380 loss: 0.22878305837512017


Train, Epoch 19 / 20:  89%|████████▉ | 1397/1563 [00:41<00:04, 34.85it/s]

batch 1390 loss: 0.24325358420610427


Train, Epoch 19 / 20:  90%|████████▉ | 1405/1563 [00:41<00:04, 34.80it/s]

batch 1400 loss: 0.2458284355700016


Train, Epoch 19 / 20:  91%|█████████ | 1417/1563 [00:41<00:04, 35.29it/s]

batch 1410 loss: 0.32953447103500366


Train, Epoch 19 / 20:  91%|█████████ | 1425/1563 [00:42<00:03, 35.10it/s]

batch 1420 loss: 0.2439726322889328


Train, Epoch 19 / 20:  92%|█████████▏| 1437/1563 [00:42<00:03, 34.86it/s]

batch 1430 loss: 0.2353248305618763


Train, Epoch 19 / 20:  92%|█████████▏| 1445/1563 [00:42<00:03, 34.91it/s]

batch 1440 loss: 0.21097082123160363


Train, Epoch 19 / 20:  93%|█████████▎| 1457/1563 [00:42<00:02, 35.38it/s]

batch 1450 loss: 0.27912010848522184


Train, Epoch 19 / 20:  94%|█████████▎| 1465/1563 [00:43<00:02, 35.04it/s]

batch 1460 loss: 0.27907306924462316


Train, Epoch 19 / 20:  94%|█████████▍| 1477/1563 [00:43<00:02, 34.18it/s]

batch 1470 loss: 0.18365527354180813


Train, Epoch 19 / 20:  95%|█████████▌| 1485/1563 [00:43<00:02, 34.18it/s]

batch 1480 loss: 0.2706627048552036


Train, Epoch 19 / 20:  96%|█████████▌| 1497/1563 [00:44<00:01, 35.16it/s]

batch 1490 loss: 0.22376233860850334


Train, Epoch 19 / 20:  96%|█████████▋| 1505/1563 [00:44<00:01, 34.67it/s]

batch 1500 loss: 0.24053149968385695


Train, Epoch 19 / 20:  97%|█████████▋| 1517/1563 [00:44<00:01, 35.31it/s]

batch 1510 loss: 0.2464643556624651


Train, Epoch 19 / 20:  98%|█████████▊| 1525/1563 [00:44<00:01, 34.93it/s]

batch 1520 loss: 0.24883348122239113


Train, Epoch 19 / 20:  98%|█████████▊| 1537/1563 [00:45<00:00, 35.17it/s]

batch 1530 loss: 0.28467385917901994


Train, Epoch 19 / 20:  99%|█████████▉| 1545/1563 [00:45<00:00, 34.86it/s]

batch 1540 loss: 0.2852896019816399


Train, Epoch 19 / 20: 100%|█████████▉| 1557/1563 [00:45<00:00, 34.75it/s]

batch 1550 loss: 0.2415548011660576


Train, Epoch 19 / 20: 100%|██████████| 1563/1563 [00:45<00:00, 33.98it/s]


batch 1560 loss: 0.25948863551020623


Test, Epoch 19 / 20: 100%|██████████| 1563/1563 [00:22<00:00, 69.43it/s]


Epoch 19, loss: 0.5326969559663535, accuracy: 0.8024


Train, Epoch 20 / 20:   1%|          | 16/1563 [00:00<00:44, 34.46it/s]

batch 10 loss: 0.28776606656610965


Train, Epoch 20 / 20:   2%|▏         | 24/1563 [00:00<00:44, 34.69it/s]

batch 20 loss: 0.3146580770611763


Train, Epoch 20 / 20:   2%|▏         | 36/1563 [00:01<00:43, 34.96it/s]

batch 30 loss: 0.21072824895381928


Train, Epoch 20 / 20:   3%|▎         | 44/1563 [00:01<00:43, 34.84it/s]

batch 40 loss: 0.34753637462854386


Train, Epoch 20 / 20:   4%|▎         | 56/1563 [00:01<00:43, 34.57it/s]

batch 50 loss: 0.1929243065416813


Train, Epoch 20 / 20:   4%|▍         | 64/1563 [00:01<00:42, 34.97it/s]

batch 60 loss: 0.20200628861784936


Train, Epoch 20 / 20:   5%|▍         | 76/1563 [00:02<00:42, 35.21it/s]

batch 70 loss: 0.2867088481783867


Train, Epoch 20 / 20:   5%|▌         | 84/1563 [00:02<00:42, 34.98it/s]

batch 80 loss: 0.2091379903256893


Train, Epoch 20 / 20:   6%|▌         | 96/1563 [00:02<00:42, 34.67it/s]

batch 90 loss: 0.30139626264572145


Train, Epoch 20 / 20:   7%|▋         | 104/1563 [00:02<00:41, 35.02it/s]

batch 100 loss: 0.257981638610363


Train, Epoch 20 / 20:   7%|▋         | 116/1563 [00:03<00:42, 34.34it/s]

batch 110 loss: 0.32619528770446776


Train, Epoch 20 / 20:   8%|▊         | 124/1563 [00:03<00:43, 33.13it/s]

batch 120 loss: 0.29810181707143785


Train, Epoch 20 / 20:   9%|▊         | 136/1563 [00:03<00:45, 31.60it/s]

batch 130 loss: 0.2045995771884918


Train, Epoch 20 / 20:   9%|▉         | 144/1563 [00:04<00:45, 30.95it/s]

batch 140 loss: 0.2571876883506775


Train, Epoch 20 / 20:  10%|▉         | 156/1563 [00:04<00:43, 32.69it/s]

batch 150 loss: 0.2578813783824444


Train, Epoch 20 / 20:  10%|█         | 164/1563 [00:04<00:42, 32.88it/s]

batch 160 loss: 0.23314924389123917


Train, Epoch 20 / 20:  11%|█▏        | 176/1563 [00:05<00:41, 33.40it/s]

batch 170 loss: 0.2669617600739002


Train, Epoch 20 / 20:  12%|█▏        | 184/1563 [00:05<00:43, 31.74it/s]

batch 180 loss: 0.2745468974113464


Train, Epoch 20 / 20:  13%|█▎        | 196/1563 [00:05<00:44, 30.48it/s]

batch 190 loss: 0.22520689964294432


Train, Epoch 20 / 20:  13%|█▎        | 203/1563 [00:06<00:46, 29.28it/s]

batch 200 loss: 0.21863097324967384


Train, Epoch 20 / 20:  14%|█▍        | 215/1563 [00:06<00:43, 30.81it/s]

batch 210 loss: 0.1734200969338417


Train, Epoch 20 / 20:  15%|█▍        | 227/1563 [00:06<00:39, 33.65it/s]

batch 220 loss: 0.3426970273256302


Train, Epoch 20 / 20:  15%|█▌        | 235/1563 [00:07<00:39, 33.95it/s]

batch 230 loss: 0.2257175363600254


Train, Epoch 20 / 20:  16%|█▌        | 247/1563 [00:07<00:37, 34.73it/s]

batch 240 loss: 0.22193634361028672


Train, Epoch 20 / 20:  16%|█▋        | 255/1563 [00:07<00:37, 34.91it/s]

batch 250 loss: 0.22519089356064798


Train, Epoch 20 / 20:  17%|█▋        | 267/1563 [00:08<00:37, 34.95it/s]

batch 260 loss: 0.3296350218355656


Train, Epoch 20 / 20:  18%|█▊        | 275/1563 [00:08<00:36, 34.87it/s]

batch 270 loss: 0.25532516166567804


Train, Epoch 20 / 20:  18%|█▊        | 287/1563 [00:08<00:36, 35.30it/s]

batch 280 loss: 0.2536663576960564


Train, Epoch 20 / 20:  19%|█▉        | 295/1563 [00:08<00:36, 34.86it/s]

batch 290 loss: 0.24223174899816513


Train, Epoch 20 / 20:  20%|█▉        | 307/1563 [00:09<00:35, 34.94it/s]

batch 300 loss: 0.20911885499954225


Train, Epoch 20 / 20:  20%|██        | 315/1563 [00:09<00:35, 35.04it/s]

batch 310 loss: 0.2858888581395149


Train, Epoch 20 / 20:  21%|██        | 327/1563 [00:09<00:34, 35.40it/s]

batch 320 loss: 0.23455870524048805


Train, Epoch 20 / 20:  21%|██▏       | 335/1563 [00:09<00:34, 35.45it/s]

batch 330 loss: 0.24733763225376607


Train, Epoch 20 / 20:  22%|██▏       | 347/1563 [00:10<00:34, 35.36it/s]

batch 340 loss: 0.2348851315677166


Train, Epoch 20 / 20:  23%|██▎       | 355/1563 [00:10<00:34, 35.22it/s]

batch 350 loss: 0.26498872637748716


Train, Epoch 20 / 20:  23%|██▎       | 367/1563 [00:10<00:34, 35.05it/s]

batch 360 loss: 0.2168670989573002


Train, Epoch 20 / 20:  24%|██▍       | 375/1563 [00:11<00:34, 34.76it/s]

batch 370 loss: 0.36420089825987817


Train, Epoch 20 / 20:  25%|██▍       | 387/1563 [00:11<00:33, 34.95it/s]

batch 380 loss: 0.28858553171157836


Train, Epoch 20 / 20:  25%|██▌       | 395/1563 [00:11<00:33, 34.94it/s]

batch 390 loss: 0.24672203361988068


Train, Epoch 20 / 20:  26%|██▌       | 407/1563 [00:12<00:33, 34.67it/s]

batch 400 loss: 0.2842236593365669


Train, Epoch 20 / 20:  27%|██▋       | 415/1563 [00:12<00:33, 34.40it/s]

batch 410 loss: 0.2394745945930481


Train, Epoch 20 / 20:  27%|██▋       | 427/1563 [00:12<00:32, 34.86it/s]

batch 420 loss: 0.320151786133647


Train, Epoch 20 / 20:  28%|██▊       | 435/1563 [00:12<00:32, 35.13it/s]

batch 430 loss: 0.21560827307403088


Train, Epoch 20 / 20:  29%|██▊       | 447/1563 [00:13<00:31, 34.97it/s]

batch 440 loss: 0.23161925598978997


Train, Epoch 20 / 20:  29%|██▉       | 455/1563 [00:13<00:31, 34.69it/s]

batch 450 loss: 0.28352734819054604


Train, Epoch 20 / 20:  30%|██▉       | 467/1563 [00:13<00:30, 35.37it/s]

batch 460 loss: 0.20966960825026035


Train, Epoch 20 / 20:  30%|███       | 475/1563 [00:13<00:31, 35.07it/s]

batch 470 loss: 0.3162419967353344


Train, Epoch 20 / 20:  31%|███       | 487/1563 [00:14<00:30, 34.91it/s]

batch 480 loss: 0.2715203985571861


Train, Epoch 20 / 20:  32%|███▏      | 495/1563 [00:14<00:30, 34.88it/s]

batch 490 loss: 0.26622001081705093


Train, Epoch 20 / 20:  32%|███▏      | 507/1563 [00:14<00:29, 35.44it/s]

batch 500 loss: 0.24125612825155257


Train, Epoch 20 / 20:  33%|███▎      | 515/1563 [00:15<00:29, 35.15it/s]

batch 510 loss: 0.27317976430058477


Train, Epoch 20 / 20:  34%|███▎      | 527/1563 [00:15<00:29, 35.20it/s]

batch 520 loss: 0.20946056507527827


Train, Epoch 20 / 20:  34%|███▍      | 535/1563 [00:15<00:29, 35.17it/s]

batch 530 loss: 0.250675143301487


Train, Epoch 20 / 20:  35%|███▍      | 547/1563 [00:16<00:29, 34.56it/s]

batch 540 loss: 0.21506980396807193


Train, Epoch 20 / 20:  36%|███▌      | 555/1563 [00:16<00:29, 34.28it/s]

batch 550 loss: 0.27214674577116965


Train, Epoch 20 / 20:  36%|███▌      | 563/1563 [00:16<00:29, 34.04it/s]

batch 560 loss: 0.2973888710141182


Train, Epoch 20 / 20:  37%|███▋      | 575/1563 [00:16<00:30, 32.47it/s]

batch 570 loss: 0.25383185744285586


Train, Epoch 20 / 20:  37%|███▋      | 583/1563 [00:17<00:30, 32.07it/s]

batch 580 loss: 0.26626727283000945


Train, Epoch 20 / 20:  38%|███▊      | 595/1563 [00:17<00:31, 31.12it/s]

batch 590 loss: 0.28980895355343816


Train, Epoch 20 / 20:  39%|███▊      | 603/1563 [00:17<00:30, 31.93it/s]

batch 600 loss: 0.23264744281768798


Train, Epoch 20 / 20:  39%|███▉      | 615/1563 [00:18<00:30, 31.30it/s]

batch 610 loss: 0.22550875842571258


Train, Epoch 20 / 20:  40%|███▉      | 623/1563 [00:18<00:29, 32.20it/s]

batch 620 loss: 0.25127013474702836


Train, Epoch 20 / 20:  41%|████      | 635/1563 [00:18<00:29, 31.37it/s]

batch 630 loss: 0.2853378228843212


Train, Epoch 20 / 20:  41%|████      | 643/1563 [00:19<00:28, 31.88it/s]

batch 640 loss: 0.2656139701604843


Train, Epoch 20 / 20:  42%|████▏     | 655/1563 [00:19<00:28, 32.03it/s]

batch 650 loss: 0.26077028065919877


Train, Epoch 20 / 20:  43%|████▎     | 667/1563 [00:19<00:27, 32.96it/s]

batch 660 loss: 0.25584477763623


Train, Epoch 20 / 20:  43%|████▎     | 675/1563 [00:20<00:26, 33.95it/s]

batch 670 loss: 0.2769795097410679


Train, Epoch 20 / 20:  44%|████▎     | 683/1563 [00:20<00:25, 34.54it/s]

batch 680 loss: 0.27694364339113237


Train, Epoch 20 / 20:  44%|████▍     | 695/1563 [00:20<00:24, 35.10it/s]

batch 690 loss: 0.3225280500948429


Train, Epoch 20 / 20:  45%|████▌     | 707/1563 [00:20<00:24, 35.08it/s]

batch 700 loss: 0.2849066823720932


Train, Epoch 20 / 20:  46%|████▌     | 715/1563 [00:21<00:23, 35.41it/s]

batch 710 loss: 0.22981838509440422


Train, Epoch 20 / 20:  47%|████▋     | 727/1563 [00:21<00:23, 34.83it/s]

batch 720 loss: 0.21089811641722916


Train, Epoch 20 / 20:  47%|████▋     | 735/1563 [00:21<00:23, 35.02it/s]

batch 730 loss: 0.24512489289045333


Train, Epoch 20 / 20:  48%|████▊     | 747/1563 [00:22<00:23, 35.23it/s]

batch 740 loss: 0.19510509297251702


Train, Epoch 20 / 20:  48%|████▊     | 755/1563 [00:22<00:22, 35.30it/s]

batch 750 loss: 0.3017526641488075


Train, Epoch 20 / 20:  49%|████▉     | 767/1563 [00:22<00:22, 34.82it/s]

batch 760 loss: 0.24822821989655494


Train, Epoch 20 / 20:  50%|████▉     | 775/1563 [00:22<00:22, 34.54it/s]

batch 770 loss: 0.23068684563040734


Train, Epoch 20 / 20:  50%|█████     | 787/1563 [00:23<00:22, 34.50it/s]

batch 780 loss: 0.2535159312188625


Train, Epoch 20 / 20:  51%|█████     | 795/1563 [00:23<00:22, 33.80it/s]

batch 790 loss: 0.2726580686867237


Train, Epoch 20 / 20:  52%|█████▏    | 807/1563 [00:23<00:22, 34.12it/s]

batch 800 loss: 0.25163969621062277


Train, Epoch 20 / 20:  52%|█████▏    | 815/1563 [00:24<00:21, 34.00it/s]

batch 810 loss: 0.2724343538284302


Train, Epoch 20 / 20:  53%|█████▎    | 827/1563 [00:24<00:21, 34.40it/s]

batch 820 loss: 0.29769530296325686


Train, Epoch 20 / 20:  53%|█████▎    | 835/1563 [00:24<00:20, 34.88it/s]

batch 830 loss: 0.28194448053836824


Train, Epoch 20 / 20:  54%|█████▍    | 847/1563 [00:25<00:20, 34.65it/s]

batch 840 loss: 0.21465484313666822


Train, Epoch 20 / 20:  55%|█████▍    | 855/1563 [00:25<00:20, 35.13it/s]

batch 850 loss: 0.2768016904592514


Train, Epoch 20 / 20:  55%|█████▌    | 867/1563 [00:25<00:19, 35.51it/s]

batch 860 loss: 0.30164741948246954


Train, Epoch 20 / 20:  56%|█████▌    | 875/1563 [00:25<00:19, 35.41it/s]

batch 870 loss: 0.2589862883090973


Train, Epoch 20 / 20:  57%|█████▋    | 887/1563 [00:26<00:19, 35.41it/s]

batch 880 loss: 0.23837213441729546


Train, Epoch 20 / 20:  57%|█████▋    | 895/1563 [00:26<00:19, 34.39it/s]

batch 890 loss: 0.26403027921915057


Train, Epoch 20 / 20:  58%|█████▊    | 907/1563 [00:26<00:19, 34.36it/s]

batch 900 loss: 0.28252449780702593


Train, Epoch 20 / 20:  59%|█████▊    | 915/1563 [00:26<00:18, 34.44it/s]

batch 910 loss: 0.195563892275095


Train, Epoch 20 / 20:  59%|█████▉    | 927/1563 [00:27<00:18, 34.58it/s]

batch 920 loss: 0.3865994080901146


Train, Epoch 20 / 20:  60%|█████▉    | 935/1563 [00:27<00:18, 34.59it/s]

batch 930 loss: 0.19227620176970958


Train, Epoch 20 / 20:  61%|██████    | 947/1563 [00:27<00:17, 34.84it/s]

batch 940 loss: 0.2638703197240829


Train, Epoch 20 / 20:  61%|██████    | 955/1563 [00:28<00:17, 34.44it/s]

batch 950 loss: 0.22399401739239694


Train, Epoch 20 / 20:  62%|██████▏   | 967/1563 [00:28<00:17, 34.20it/s]

batch 960 loss: 0.17741110324859619


Train, Epoch 20 / 20:  62%|██████▏   | 975/1563 [00:28<00:17, 34.51it/s]

batch 970 loss: 0.35056022051721814


Train, Epoch 20 / 20:  63%|██████▎   | 987/1563 [00:29<00:16, 35.06it/s]

batch 980 loss: 0.4305556662380695


Train, Epoch 20 / 20:  64%|██████▎   | 995/1563 [00:29<00:16, 34.80it/s]

batch 990 loss: 0.20065998360514642


Train, Epoch 20 / 20:  64%|██████▍   | 1007/1563 [00:29<00:16, 34.55it/s]

batch 1000 loss: 0.2805582210421562


Train, Epoch 20 / 20:  65%|██████▍   | 1015/1563 [00:29<00:16, 33.04it/s]

batch 1010 loss: 0.2514815375208855


Train, Epoch 20 / 20:  65%|██████▌   | 1023/1563 [00:30<00:16, 32.55it/s]

batch 1020 loss: 0.31074663400650027


Train, Epoch 20 / 20:  66%|██████▌   | 1035/1563 [00:30<00:16, 32.41it/s]

batch 1030 loss: 0.27198274806141853


Train, Epoch 20 / 20:  67%|██████▋   | 1043/1563 [00:30<00:16, 32.11it/s]

batch 1040 loss: 0.24088961631059647


Train, Epoch 20 / 20:  67%|██████▋   | 1055/1563 [00:31<00:15, 33.27it/s]

batch 1050 loss: 0.26329625621438024


Train, Epoch 20 / 20:  68%|██████▊   | 1063/1563 [00:31<00:15, 32.07it/s]

batch 1060 loss: 0.22262725234031677


Train, Epoch 20 / 20:  69%|██████▉   | 1075/1563 [00:31<00:15, 31.49it/s]

batch 1070 loss: 0.2563794683665037


Train, Epoch 20 / 20:  69%|██████▉   | 1083/1563 [00:32<00:15, 30.64it/s]

batch 1080 loss: 0.2897327311336994


Train, Epoch 20 / 20:  70%|███████   | 1095/1563 [00:32<00:15, 29.71it/s]

batch 1090 loss: 0.26751307025551796


Train, Epoch 20 / 20:  71%|███████   | 1103/1563 [00:32<00:15, 30.09it/s]

batch 1100 loss: 0.30329597145318987


Train, Epoch 20 / 20:  71%|███████▏  | 1115/1563 [00:33<00:13, 33.14it/s]

batch 1110 loss: 0.24244550839066506


Train, Epoch 20 / 20:  72%|███████▏  | 1127/1563 [00:33<00:12, 34.66it/s]

batch 1120 loss: 0.2101071711629629


Train, Epoch 20 / 20:  73%|███████▎  | 1135/1563 [00:33<00:12, 34.68it/s]

batch 1130 loss: 0.28115318715572357


Train, Epoch 20 / 20:  73%|███████▎  | 1147/1563 [00:33<00:12, 34.65it/s]

batch 1140 loss: 0.2865744173526764


Train, Epoch 20 / 20:  74%|███████▍  | 1155/1563 [00:34<00:11, 34.53it/s]

batch 1150 loss: 0.23055411726236344


Train, Epoch 20 / 20:  75%|███████▍  | 1167/1563 [00:34<00:11, 34.83it/s]

batch 1160 loss: 0.29993886649608614


Train, Epoch 20 / 20:  75%|███████▌  | 1175/1563 [00:34<00:11, 34.60it/s]

batch 1170 loss: 0.2473014198243618


Train, Epoch 20 / 20:  76%|███████▌  | 1187/1563 [00:35<00:10, 34.79it/s]

batch 1180 loss: 0.2875964343547821


Train, Epoch 20 / 20:  76%|███████▋  | 1195/1563 [00:35<00:10, 34.21it/s]

batch 1190 loss: 0.24450377523899078


Train, Epoch 20 / 20:  77%|███████▋  | 1207/1563 [00:35<00:10, 34.52it/s]

batch 1200 loss: 0.30259646102786064


Train, Epoch 20 / 20:  78%|███████▊  | 1215/1563 [00:35<00:10, 34.36it/s]

batch 1210 loss: 0.3112665623426437


Train, Epoch 20 / 20:  79%|███████▊  | 1227/1563 [00:36<00:09, 34.69it/s]

batch 1220 loss: 0.23387737646698953


Train, Epoch 20 / 20:  79%|███████▉  | 1235/1563 [00:36<00:09, 35.11it/s]

batch 1230 loss: 0.22942728847265242


Train, Epoch 20 / 20:  80%|███████▉  | 1247/1563 [00:36<00:09, 34.48it/s]

batch 1240 loss: 0.22530388981103897


Train, Epoch 20 / 20:  80%|████████  | 1255/1563 [00:37<00:08, 34.88it/s]

batch 1250 loss: 0.22561768740415572


Train, Epoch 20 / 20:  81%|████████  | 1267/1563 [00:37<00:08, 35.27it/s]

batch 1260 loss: 0.30414925515651703


Train, Epoch 20 / 20:  82%|████████▏ | 1275/1563 [00:37<00:08, 35.23it/s]

batch 1270 loss: 0.23987182825803757


Train, Epoch 20 / 20:  82%|████████▏ | 1287/1563 [00:38<00:07, 35.07it/s]

batch 1280 loss: 0.2804314337670803


Train, Epoch 20 / 20:  83%|████████▎ | 1295/1563 [00:38<00:07, 34.83it/s]

batch 1290 loss: 0.25653102844953535


Train, Epoch 20 / 20:  84%|████████▎ | 1307/1563 [00:38<00:07, 35.25it/s]

batch 1300 loss: 0.2267318956553936


Train, Epoch 20 / 20:  84%|████████▍ | 1315/1563 [00:38<00:07, 34.58it/s]

batch 1310 loss: 0.2687976978719234


Train, Epoch 20 / 20:  85%|████████▍ | 1327/1563 [00:39<00:06, 34.89it/s]

batch 1320 loss: 0.24076344519853593


Train, Epoch 20 / 20:  85%|████████▌ | 1335/1563 [00:39<00:06, 35.30it/s]

batch 1330 loss: 0.2770844876766205


Train, Epoch 20 / 20:  86%|████████▌ | 1347/1563 [00:39<00:06, 35.29it/s]

batch 1340 loss: 0.24875917229801417


Train, Epoch 20 / 20:  87%|████████▋ | 1355/1563 [00:39<00:06, 34.38it/s]

batch 1350 loss: 0.32637241333723066


Train, Epoch 20 / 20:  87%|████████▋ | 1367/1563 [00:40<00:05, 35.09it/s]

batch 1360 loss: 0.3401364266872406


Train, Epoch 20 / 20:  88%|████████▊ | 1375/1563 [00:40<00:05, 34.98it/s]

batch 1370 loss: 0.23800864443182945


Train, Epoch 20 / 20:  89%|████████▊ | 1387/1563 [00:40<00:04, 35.41it/s]

batch 1380 loss: 0.26326804906129836


Train, Epoch 20 / 20:  89%|████████▉ | 1395/1563 [00:41<00:04, 34.90it/s]

batch 1390 loss: 0.22708635106682779


Train, Epoch 20 / 20:  90%|█████████ | 1407/1563 [00:41<00:04, 35.25it/s]

batch 1400 loss: 0.2985186293721199


Train, Epoch 20 / 20:  91%|█████████ | 1415/1563 [00:41<00:04, 35.33it/s]

batch 1410 loss: 0.26894223093986513


Train, Epoch 20 / 20:  91%|█████████▏| 1427/1563 [00:42<00:03, 35.48it/s]

batch 1420 loss: 0.29710159748792647


Train, Epoch 20 / 20:  92%|█████████▏| 1435/1563 [00:42<00:03, 35.10it/s]

batch 1430 loss: 0.25653244704008105


Train, Epoch 20 / 20:  93%|█████████▎| 1447/1563 [00:42<00:03, 34.95it/s]

batch 1440 loss: 0.2565718054771423


Train, Epoch 20 / 20:  93%|█████████▎| 1455/1563 [00:42<00:03, 34.35it/s]

batch 1450 loss: 0.2760492376983166


Train, Epoch 20 / 20:  94%|█████████▎| 1463/1563 [00:43<00:03, 32.16it/s]

batch 1460 loss: 0.28292696103453635


Train, Epoch 20 / 20:  94%|█████████▍| 1475/1563 [00:43<00:02, 31.52it/s]

batch 1470 loss: 0.21590857803821564


Train, Epoch 20 / 20:  95%|█████████▍| 1483/1563 [00:43<00:02, 32.69it/s]

batch 1480 loss: 0.2159091368317604


Train, Epoch 20 / 20:  96%|█████████▌| 1495/1563 [00:44<00:02, 32.23it/s]

batch 1490 loss: 0.2199613220989704


Train, Epoch 20 / 20:  96%|█████████▌| 1503/1563 [00:44<00:01, 31.26it/s]

batch 1500 loss: 0.2657366514205933


Train, Epoch 20 / 20:  97%|█████████▋| 1515/1563 [00:44<00:01, 31.01it/s]

batch 1510 loss: 0.2582914888858795


Train, Epoch 20 / 20:  97%|█████████▋| 1523/1563 [00:45<00:01, 30.72it/s]

batch 1520 loss: 0.37798911333084106


Train, Epoch 20 / 20:  98%|█████████▊| 1535/1563 [00:45<00:00, 30.29it/s]

batch 1530 loss: 0.27694551199674605


Train, Epoch 20 / 20:  99%|█████████▊| 1543/1563 [00:45<00:00, 30.15it/s]

batch 1540 loss: 0.23295502215623856


Train, Epoch 20 / 20:  99%|█████████▉| 1555/1563 [00:46<00:00, 32.02it/s]

batch 1550 loss: 0.2991837538778782


Train, Epoch 20 / 20: 100%|██████████| 1563/1563 [00:46<00:00, 33.75it/s]


batch 1560 loss: 0.23616600632667542


Test, Epoch 20 / 20: 100%|██████████| 1563/1563 [00:22<00:00, 70.92it/s]

Epoch 20, loss: 0.5130145644637942, accuracy: 0.7996





# **2. Vectorized implementation of MoE layer that works with num_experts_per_token==1**

In [None]:
# Input: [batch_size, seq_len, hidden_size] - input embeddings
# Output: [batch_size, seq_len, hidden_size] - output embeddings
class VectorizedMoEForOneExpert(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.num_experts = config.num_experts
        self.hidden_size = config.hidden_size
        self.num_experts_per_token = config.num_experts_per_token
        self.capacity_factor = config.capacity_factor

        # You can change experts representation if you want
        self.experts = nn.ModuleList([MLP(config) for _ in range(self.num_experts)])
        self.router = Router(config)

    def forward(self, x):
        batch_size, seq_len, hidden_size = x.shape
        expert_capacity = torch.ceil(torch.tensor(batch_size * seq_len / self.num_experts * self.capacity_factor, device=x.device, dtype=torch.int))
        routing_weights = self.router(x)
        for i in range(self.num_experts):
            token_indices = torch.nonzero(routing_weights[:, :, i], as_tuple=False)
            if token_indices.shape[0] > expert_capacity:
                routing_weights[token_indices[expert_capacity:, 0], token_indices[expert_capacity:, 1], i] = 0

        expert_outputs = torch.zeros(batch_size, seq_len, self.hidden_size, device=x.device)
        for i in range(self.num_experts):
            expert_indices = torch.nonzero(routing_weights[:, :, i], as_tuple=False)
            expert_outputs[expert_indices[:, 0], expert_indices[:, 1]] = self.experts[i](x[expert_indices[:, 0], expert_indices[:, 1]])

        return expert_outputs