From Switch Transformer paper:

>In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example. The result is a sparsely-activated model -- with outrageous numbers of parameters -- but a constant computational cost.

A vanilla Transformer block looks like this:

```python
class ModernTransformerBlock(nn.Module):
    def __init__(self, embed_dim, n_heads, up):
        super().__init__()
        self.attn = nn.MultiheadAttention(embed_dim, n_heads)
        self.mlp = nn.Sequential(
            SwishGLU(embed_dim, embed_dim * up),
            nn.Linear(embed_dim * up, embed_dim),
        )
        self.pre_attn_norm = RMSNorm(embed_dim)
        self.pre_mlp_norm = RMSNorm(embed_dim)
    
    def forward(self, x):
        x = x + self.attn(self.pre_attn_norm(x))
        x = x + self.mlp(self.pre_mlp_norm(x))
        return x
```

The Mixture-of-Experts layer replaces the MLP layer. Instead of having one MLP layer, we have `num_experts` different MLP layers called *experts*.

The idea is to process a contextualized token, by sending it to a subset of experts. In this way we could efficiently increase the number of parameters of the model without affecting computational cost too much.

First, the token is fed into *router*, which determines to which experts a token should go to be processed. For computational reasons, there is a fixed limit on:
* how many tokens an expert can process, and
* by how many experts a token is processed.

# Grading
Your task is to implement a Mixture of Experts layer. You can get points for the following subtasks:
1.  (5 points) Naive implementation of MoE layer that works with `num_experts_per_token>=1`
2.  (5 points) Well-vectorized implementation of MoE layer that works with `num_experts_per_token=1`
3.  (5 points) Implementation of a script testing for 1. 2. implementations output equivalence and performance superiority of 2.
4.  (5 points) Well-vectorized implementation of MoE layer that works with `num_experts_per_token>=1`
5.  (Bonus 5 points) Use Huggingface's Trainer class and compare performance of randomly initialized MoE Transformer and standard Transformer on `https://huggingface.co/datasets/imdb` dataset.

20 points scored in this task is equivalent to at least 16% points achievable in this course.

Please submit your assignments until 15th of April, 18:00 CET.

# Rules
- You shouldn't change basic `forward` and `initialization` signatures of the main classes: `Router` and `MoE`. You can add additional arguments with default values.
- As an assignment, provide a Jupyter notebook with a short introduction at the top of what has been done and where.
- You can add or remove any other classes, though you should keep the behaviour of `MLP` class somehow.
- Sensible vectorization is good enough for the maximum amount of points. There is no need to optimize performance to the max, just show that you can identify opportunities for vectorization and you are able to implement complex vectorizations.
- If in doubt, direct questions to either Jan Ludziejewski or Juliusz Straszyński.
- A notebook that is hard to grade (crashing, obfuscated) might be scored for 0 points.

# Hints
- First, write a naive implementation, vectorized operations might be hard to analyze for correctness.
- You can make randomness deterministic by appropriate torch functions.
- If you have a hard time fulfilling fair randomness for token discarding, you can try keeping the earlier tokens.

In [1]:
%pip install torch_tb_profiler einops

Collecting torch_tb_profiler
  Downloading torch_tb_profiler-0.4.3-py3-none-any.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting einops
  Downloading einops-0.7.0-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: einops, torch_tb_profiler
Successfully installed einops-0.7.0 torch_tb_profiler-0.4.3


In [2]:
from torch import nn
import torch
from transformers import PretrainedConfig
import torch.nn.functional as F
from einops import einsum

class MLP(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.mlp = nn.Sequential(
            nn.Linear(config.hidden_size, config.intermediate_size),
            nn.ReLU(),
            nn.Linear(config.intermediate_size, config.hidden_size),
        )

    def forward(self, x):
        return self.mlp(x)

# Router
The router is a module which assigns tokens to experts. It answers two questions:
1. Which tokens should be assigned to which expert.
2. How much weight should be assigned to each expert. The weight is determined by similarity between the token embedding and the expert embedding

The following conditions must be satisfied:
1. The routing weights must sum to 1 for each token and be non-negative
2. A token should have exactly `num_experts_per_token` non-zero weights

In [3]:
# Input: [batch_size, seq_len, hidden_size] - input embeddings
# Output: [batch_size, seq_len, num_experts] - expert routing weights
class Router(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.num_experts_per_token = config.num_experts_per_token
        self.hidden_size = config.hidden_size
        self.num_experts = config.num_experts

        self.expert_embeddings = nn.Parameter(torch.randn(self.num_experts, self.hidden_size))
        torch.nn.init.kaiming_uniform_(self.expert_embeddings, nonlinearity='linear')

    def forward(self, x):
        pass

The MoE module is a module which wraps around a set of expert modules and a router module.

It takes input embeddings and routes them to the experts.

Each token is processed individually by a subset of experts.

The output token embedding is a weighted sum of the expert outputs.

The weights are determined by the router module.

The subset of experts is determined by non-zero weights in the routing output.

Additionally each expert might process at most `expert_capacity = ceil((batch_size * seq_len) / num_experts * capacity_factor)` tokens

Superfluous tokens to be discarded by a particular expert should be selected uniformly at random.

Discarding should be equivalent to setting the appropriate routing weight to 0, while other weights remain the same.

This means that a token is processed by at most num_experts_per_token experts with a sum of weights of at most 1.

Specifically, this could mean that a token is processed by 0 experts - in this case the resulting embedding should be a zero tensor.

In [4]:
import math

# Input: [batch_size, seq_len, hidden_size] - input embeddings
# Output: [batch_size, seq_len, hidden_size] - output embeddings
class MoE(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.num_experts = config.num_experts
        self.hidden_size = config.hidden_size
        self.num_experts_per_token = config.num_experts_per_token
        self.capacity_factor = config.capacity_factor

        # You can change experts representation if you want
        self.experts = nn.ModuleList([MLP(config) for _ in range(self.num_experts)])
        self.router = Router(config)

    def forward(self, x):
        batch_size, seq_len, hidden_size = x.shape
        expert_capacity = math.ceil(batch_size * seq_len / self.num_experts * self.capacity_factor)
        pass

# Configurations

In [5]:
base_config = dict(
    vocab_size=5000,
    max_position_embeddings=256,
    num_attention_heads=8,
    num_hidden_layers=4,
    hidden_dropout_prob=0.1,
    hidden_size=128,
    intermediate_size=512,
    num_labels=2
)

standard_config = PretrainedConfig(
    **base_config,
    ff_cls=MLP
)

moe_config = PretrainedConfig(
    **base_config,
    num_experts=4,
    capacity_factor=2.0,
    num_experts_per_token=1,
    ff_cls=MoE
)

# Basic Transformer-related classes

In [6]:
from einops import rearrange

class Embedding(nn.Module):
  def __init__(self, config):
    super(Embedding, self).__init__()
    self.word_embed = nn.Embedding(config.vocab_size, config.hidden_size)
    self.pos_embed = nn.Embedding(config.max_position_embeddings, config.hidden_size)
    self.dropout = nn.Dropout(config.hidden_dropout_prob)

  def forward(self, x):
    batch_size, seq_length = x.shape
    device = x.device
    positions = torch.arange(0, seq_length).expand(
        batch_size, seq_length).to(device)
    embedding = self.word_embed(x) + self.pos_embed(positions)
    return self.dropout(embedding)


class MHSelfAttention(nn.Module):
    def __init__(self, config: PretrainedConfig):
        super(MHSelfAttention, self).__init__()
        self.num_attention_heads = config.num_attention_heads
        self.hidden_size = config.hidden_size
        self.head_size = self.hidden_size // self.num_attention_heads
        self.num_attention_heads = config.num_attention_heads
        self.qkv = nn.Linear(self.hidden_size, 3 * self.hidden_size, bias=False)

    def forward(self, embeddings):
        batch_size, seq_length, hidden_size = embeddings.size()

        result = self.qkv(embeddings)
        q, k, v = rearrange(result, 'b s (qkv nah hdsz) -> qkv b nah s hdsz', nah=self.num_attention_heads, qkv=3).unbind(0)

        attention_scores = torch.matmul(q, k.transpose(-1, -2))
        attention_scores = attention_scores / math.sqrt(hidden_size)
        attention_probs = nn.Softmax(dim=-1)(attention_scores)

        contextualized_layer = torch.matmul(attention_probs, v)

        outputs = rearrange(contextualized_layer, 'b nah s hdsz -> b s (nah hdsz)')
        return outputs

class TransformerBlock(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.attention = MHSelfAttention(config)
        self.norm1 = nn.LayerNorm(config.hidden_size)
        self.norm2 = nn.LayerNorm(config.hidden_size)
        self.intermediate = config.ff_cls(config)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)

    def forward(self, x):
        x =  x + self.norm1(self.dropout(self.attention(x)))
        x =  x + self.norm2(self.dropout(self.intermediate(x)))
        return x

class TransformerClassifier(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.embeddings = Embedding(config)
        self.layer = nn.Sequential(*[TransformerBlock(config) for _ in range(config.num_hidden_layers)])
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    def forward(self, input_ids, labels=None):
        embedding_output = self.embeddings(input_ids)
        encoding = self.layer(embedding_output)
        pooled_encoding = encoding.mean(dim=1)
        logits = self.classifier(pooled_encoding)
        loss = F.cross_entropy(logits, labels) if labels is not None else None
        return {
            'loss': loss,
            'logits': logits,
        }

# Tokenizer training

In [7]:
%pip install datasets

Collecting datasets
  Downloading datasets-2.18.0-py3-none-any.whl (510 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: xxhash, dill, multiprocess, datasets
Successfully installed dataset

In [8]:
from tokenizers import ByteLevelBPETokenizer
from datasets import load_dataset
from tokenizers.processors import BertProcessing

dataset = load_dataset('imdb')

tokenizer = ByteLevelBPETokenizer()
tokenizer.train_from_iterator(
    dataset['train']['text'],
    vocab_size=base_config['vocab_size'],
    special_tokens=["<s>", "</s>", "<pad>"],
    min_frequency=2
)
tokenizer.post_processor = BertProcessing(
    ("</s>", tokenizer.token_to_id("</s>")),
    ("<s>", tokenizer.token_to_id("<s>")),
)

tokenizer.enable_truncation(max_length=base_config['max_position_embeddings'])
tokenizer.enable_padding(pad_id=tokenizer.token_to_id("<pad>"), pad_token="<pad>", length=base_config['max_position_embeddings'])
tokenizer.model_max_length = base_config['max_position_embeddings']
tokenizer.pad_token = "<pad>"

from transformers import Trainer, TrainingArguments

def tokenize(row):
    return {
        'input_ids': tokenizer.encode(row['text']).ids,
    }

tokenized_dataset = dataset.map(tokenize)

Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

# **1. Naive implementation of MoE layer that works with num_experts_per_token>=1 and leftmost token choosing strategy.**

In [9]:
# Input: [batch_size, seq_len, hidden_size] - input embeddings
# Output: [batch_size, seq_len, num_experts] - expert routing weights
class Router(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.num_experts_per_token = config.num_experts_per_token
        self.hidden_size = config.hidden_size
        self.num_experts = config.num_experts

        self.expert_embeddings = nn.Parameter(torch.randn(self.num_experts, self.hidden_size))
        torch.nn.init.kaiming_uniform_(self.expert_embeddings, nonlinearity='linear')

    def forward(self, x):
        batch_size, seq_len, hidden_size = x.shape
        similarity = einsum(x, self.expert_embeddings, 'b s h, e h -> b s e')
        top_experts = torch.topk(similarity, self.num_experts_per_token)
        softmaxed_topk_values = F.softmax(top_experts.values, dim=-1)
        mask = torch.zeros_like(similarity, dtype=torch.bool)
        mask = mask.scatter_(-1, top_experts.indices, 1)
        routing_weights = torch.zeros_like(similarity)
        routing_weights[mask] = softmaxed_topk_values.flatten()

        return routing_weights

In [10]:
# Input: [batch_size, seq_len, hidden_size] - input embeddings
# Output: [batch_size, seq_len, hidden_size] - output embeddings
class NaiveMoE(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.num_experts = config.num_experts
        self.hidden_size = config.hidden_size
        self.num_experts_per_token = config.num_experts_per_token
        self.capacity_factor = config.capacity_factor

        # You can change experts representation if you want
        self.experts = nn.ModuleList([MLP(config) for _ in range(self.num_experts)])
        self.router = Router(config)

    def forward(self, x):
        batch_size, seq_len, hidden_size = x.shape
        expert_capacity = torch.ceil(torch.tensor(batch_size * seq_len / self.num_experts * self.capacity_factor, device=x.device, dtype=torch.int))
        routing_weights = self.router(x)
        for i in range(self.num_experts):
            token_indices = torch.nonzero(routing_weights[:, :, i], as_tuple=False)
            if token_indices.shape[0] > expert_capacity:
                routing_weights[token_indices[expert_capacity:, 0], token_indices[expert_capacity:, 1], i] = 0

        expert_outputs = torch.zeros(batch_size, seq_len, self.hidden_size, device=x.device)
        for i in range(self.num_experts):
            expert_indices = torch.nonzero(routing_weights[:, :, i], as_tuple=False)
            expert_outputs[expert_indices[:, 0], expert_indices[:, 1]] = self.experts[i](x[expert_indices[:, 0], expert_indices[:, 1]])

        return expert_outputs

In [11]:
from torch.utils.data import DataLoader

naive_moe_config = PretrainedConfig(
    **base_config,
    num_experts=4,
    capacity_factor=2.0,
    num_experts_per_token=2,
    ff_cls=NaiveMoE
)

train_loader = DataLoader(tokenized_dataset['train'], batch_size=16, shuffle=True)
test_loader = DataLoader(tokenized_dataset['test'], batch_size=16, shuffle=False)

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
model = TransformerClassifier(naive_moe_config).to(DEVICE)
# model = TransformerClassifier(standard_config).to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=5e-5)

In [12]:
from tqdm import tqdm

NUM_OF_EPOCHS = 20

for epoch in range(NUM_OF_EPOCHS):
    model.train()
    train_progress_bar = tqdm(train_loader, desc=f'Train, Epoch {epoch + 1} / {NUM_OF_EPOCHS}')
    running_loss = 0.
    for i, batch in enumerate(train_progress_bar):
        x, y = batch['input_ids'], batch['label']
        x = torch.stack(x, dim=1).to(DEVICE)
        y = y.to(DEVICE)
        optimizer.zero_grad()
        loss = model(x, y)['loss']
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        if i % 10 == 9:
            last_loss = running_loss / 10 # avg loss per batch
            print('batch {} loss: {}'.format(i + 1, last_loss))
            running_loss = 0.

    model.eval()
    with torch.no_grad():
        total_loss = 0
        total_samples = 0
        correct_samples = 0
        test_progress_bar = tqdm(test_loader, desc=f'Test, Epoch {epoch + 1} / {NUM_OF_EPOCHS}')
        for batch in test_progress_bar:
            x, y = batch['input_ids'], batch['label']
            x = torch.stack(x, dim=1).to(DEVICE)
            y = y.to(DEVICE)
            logits = model(x)['logits']
            total_loss += F.cross_entropy(logits, y, reduction='sum').item()
            total_samples += y.shape[0]
            correct_samples += (logits.argmax(dim=-1) == y).sum().item()

        print(f'Epoch {epoch + 1}, loss: {total_loss / total_samples}, accuracy: {correct_samples / total_samples}')

Train, Epoch 1 / 20:   1%|          | 11/1563 [00:02<03:03,  8.46it/s]

batch 10 loss: 1.0349373400211335


Train, Epoch 1 / 20:   1%|▏         | 23/1563 [00:03<01:41, 15.18it/s]

batch 20 loss: 0.8920106291770935


Train, Epoch 1 / 20:   2%|▏         | 33/1563 [00:04<01:39, 15.34it/s]

batch 30 loss: 0.8101996958255768


Train, Epoch 1 / 20:   3%|▎         | 43/1563 [00:04<01:37, 15.51it/s]

batch 40 loss: 0.7471609890460968


Train, Epoch 1 / 20:   3%|▎         | 51/1563 [00:05<01:58, 12.77it/s]

batch 50 loss: 0.7084021866321564


Train, Epoch 1 / 20:   4%|▍         | 63/1563 [00:06<01:38, 15.18it/s]

batch 60 loss: 0.7095832407474518


Train, Epoch 1 / 20:   5%|▍         | 73/1563 [00:06<01:28, 16.84it/s]

batch 70 loss: 0.7351710438728333


Train, Epoch 1 / 20:   5%|▌         | 81/1563 [00:07<02:37,  9.39it/s]

batch 80 loss: 0.7763126134872437


Train, Epoch 1 / 20:   6%|▌         | 92/1563 [00:09<02:48,  8.73it/s]

batch 90 loss: 0.8216028213500977


Train, Epoch 1 / 20:   7%|▋         | 102/1563 [00:09<01:36, 15.15it/s]

batch 100 loss: 0.7581846296787262


Train, Epoch 1 / 20:   7%|▋         | 108/1563 [00:10<02:19, 10.41it/s]


KeyboardInterrupt: 

# **2. and 4. Vectorized implementation of MoE layer that works with num_experts_per_token>=1. Satisfies 2nd and 4th part of the task.**

In [70]:
# Input: [batch_size, seq_len, hidden_size] - input embeddings
# Output: [batch_size, seq_len, hidden_size] - output embeddings
class VectorizedMoE(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.num_experts = config.num_experts
        self.hidden_size = config.hidden_size
        self.num_experts_per_token = config.num_experts_per_token
        self.capacity_factor = config.capacity_factor

        # You can change experts representation if you want
        self.expert = torch.nn.Linear(self.hidden_size, self.hidden_size)
        self.expert_weights = torch.nn.Parameter(torch.stack([self.expert.weight for _ in range(self.num_experts)], dim=0))
        self.expert_biases = torch.nn.Parameter(torch.stack([self.expert.bias for _ in range(self.num_experts)], dim=0))
        self.router = Router(config)

    def forward(self, x):
        batch_size, seq_len, hidden_size = x.shape
        expert_capacity = torch.ceil(torch.tensor(batch_size * seq_len / self.num_experts * self.capacity_factor, device=x.device, dtype=torch.int))
        routing_weights = self.router(x)
        flat_routing_weights = routing_weights.view(-1, self.num_experts)  # Shape: [batch_size * seq_len, num_experts]
        topk_values, topk_indices = flat_routing_weights.topk(k=expert_capacity, dim=0)
        mask = torch.zeros_like(flat_routing_weights).bool()
        mask.scatter_(0, topk_indices, 1)
        flat_routing_weights = flat_routing_weights * mask.float()

        x_flat = x.reshape(-1, x.size(-1))
        inputs_expanded = x_flat.unsqueeze(1).expand(-1, self.num_experts, -1)
        weighted_inputs = inputs_expanded * flat_routing_weights.unsqueeze(-1)
        combined_inputs = weighted_inputs.reshape(-1, self.hidden_size)
        combined_outputs = torch.matmul(combined_inputs, self.expert_weights.view(-1, self.hidden_size).t()) + self.expert_biases.flatten()
        combined_outputs = combined_outputs.view(self.num_experts, batch_size * seq_len, self.num_experts, self.hidden_size)
        expert_outputs = torch.sum(combined_outputs * flat_routing_weights.unsqueeze(-1), dim=(0, 2))
        expert_outputs = expert_outputs.view(batch_size, seq_len, self.hidden_size)

        return expert_outputs

In [71]:
from torch.utils.data import DataLoader

vectorized_moe_config = PretrainedConfig(
    **base_config,
    num_experts=4,
    capacity_factor=2.0,
    num_experts_per_token=2,
    ff_cls=VectorizedMoE
)

train_loader = DataLoader(tokenized_dataset['train'], batch_size=16, shuffle=True)
test_loader = DataLoader(tokenized_dataset['test'], batch_size=16, shuffle=False)

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
model = TransformerClassifier(vectorized_moe_config).to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=5e-5)

In [None]:
from tqdm import tqdm

NUM_OF_EPOCHS = 20

for epoch in range(NUM_OF_EPOCHS):
    model.train()
    train_progress_bar = tqdm(train_loader, desc=f'Train, Epoch {epoch + 1} / {NUM_OF_EPOCHS}')
    running_loss = 0.
    for i, batch in enumerate(train_progress_bar):
        x, y = batch['input_ids'], batch['label']
        x = torch.stack(x, dim=1).to(DEVICE)
        y = y.to(DEVICE)
        optimizer.zero_grad()
        loss = model(x, y)['loss']
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        if i % 10 == 9:
            last_loss = running_loss / 10 # avg loss per batch
            print('batch {} loss: {}'.format(i + 1, last_loss))
            running_loss = 0.

    model.eval()
    with torch.no_grad():
        total_loss = 0
        total_samples = 0
        correct_samples = 0
        test_progress_bar = tqdm(test_loader, desc=f'Test, Epoch {epoch + 1} / {NUM_OF_EPOCHS}')
        for batch in test_progress_bar:
            x, y = batch['input_ids'], batch['label']
            x = torch.stack(x, dim=1).to(DEVICE)
            y = y.to(DEVICE)
            logits = model(x)['logits']
            total_loss += F.cross_entropy(logits, y, reduction='sum').item()
            total_samples += y.shape[0]
            correct_samples += (logits.argmax(dim=-1) == y).sum().item()

        print(f'Epoch {epoch + 1}, loss: {total_loss / total_samples}, accuracy: {correct_samples / total_samples}')

Train, Epoch 1 / 20:   1%|          | 13/1563 [00:00<01:09, 22.45it/s]

batch 10 loss: 0.2766726344823837


Train, Epoch 1 / 20:   1%|▏         | 22/1563 [00:00<01:04, 23.91it/s]

batch 20 loss: 0.2886638596653938


Train, Epoch 1 / 20:   2%|▏         | 34/1563 [00:01<01:02, 24.39it/s]

batch 30 loss: 0.2591233104467392


Train, Epoch 1 / 20:   3%|▎         | 43/1563 [00:01<01:02, 24.31it/s]

batch 40 loss: 0.31057392954826357


Train, Epoch 1 / 20:   4%|▎         | 55/1563 [00:02<01:00, 24.83it/s]

batch 50 loss: 0.3500113531947136


Train, Epoch 1 / 20:   4%|▍         | 64/1563 [00:02<01:00, 24.76it/s]

batch 60 loss: 0.2743122145533562


Train, Epoch 1 / 20:   5%|▍         | 73/1563 [00:03<01:00, 24.48it/s]

batch 70 loss: 0.31888297945261


Train, Epoch 1 / 20:   5%|▌         | 85/1563 [00:03<01:00, 24.44it/s]

batch 80 loss: 0.27324262708425523


Train, Epoch 1 / 20:   6%|▌         | 94/1563 [00:03<00:59, 24.65it/s]

batch 90 loss: 0.3056113369762897


Train, Epoch 1 / 20:   7%|▋         | 103/1563 [00:04<00:58, 24.82it/s]

batch 100 loss: 0.3045415073633194


Train, Epoch 1 / 20:   7%|▋         | 115/1563 [00:04<00:58, 24.87it/s]

batch 110 loss: 0.24308620244264603


Train, Epoch 1 / 20:   8%|▊         | 124/1563 [00:05<00:58, 24.56it/s]

batch 120 loss: 0.31419908404350283


Train, Epoch 1 / 20:   9%|▊         | 133/1563 [00:05<01:00, 23.67it/s]

batch 130 loss: 0.26195075958967207


Train, Epoch 1 / 20:   9%|▉         | 142/1563 [00:05<00:59, 23.96it/s]

batch 140 loss: 0.2512147530913353


Train, Epoch 1 / 20:  10%|▉         | 154/1563 [00:06<00:57, 24.37it/s]

batch 150 loss: 0.34145334661006926


Train, Epoch 1 / 20:  10%|█         | 163/1563 [00:06<01:01, 22.82it/s]

batch 160 loss: 0.23814160302281379


Train, Epoch 1 / 20:  11%|█         | 172/1563 [00:07<01:02, 22.23it/s]

batch 170 loss: 0.39445137456059454


Train, Epoch 1 / 20:  12%|█▏        | 184/1563 [00:07<01:01, 22.40it/s]

batch 180 loss: 0.2817846015095711


Train, Epoch 1 / 20:  12%|█▏        | 193/1563 [00:08<01:02, 21.76it/s]

batch 190 loss: 0.4022702813148499


Train, Epoch 1 / 20:  13%|█▎        | 202/1563 [00:08<01:03, 21.55it/s]

batch 200 loss: 0.2810122758150101


Train, Epoch 1 / 20:  14%|█▎        | 214/1563 [00:09<00:56, 23.88it/s]

batch 210 loss: 0.36494734287261965


Train, Epoch 1 / 20:  14%|█▍        | 223/1563 [00:09<00:54, 24.66it/s]

batch 220 loss: 0.29545731395483016


Train, Epoch 1 / 20:  15%|█▍        | 232/1563 [00:09<00:53, 24.71it/s]

batch 230 loss: 0.42759610116481783


Train, Epoch 1 / 20:  16%|█▌        | 244/1563 [00:10<00:55, 23.76it/s]

batch 240 loss: 0.3223703056573868


Train, Epoch 1 / 20:  16%|█▌        | 253/1563 [00:10<00:54, 24.23it/s]

batch 250 loss: 0.3184202089905739


Train, Epoch 1 / 20:  17%|█▋        | 265/1563 [00:11<00:52, 24.55it/s]

batch 260 loss: 0.3343494340777397


Train, Epoch 1 / 20:  18%|█▊        | 274/1563 [00:11<00:53, 24.27it/s]

batch 270 loss: 0.292835907638073


Train, Epoch 1 / 20:  18%|█▊        | 283/1563 [00:11<00:52, 24.41it/s]

batch 280 loss: 0.3518767774105072


Train, Epoch 1 / 20:  19%|█▉        | 295/1563 [00:12<00:51, 24.39it/s]

batch 290 loss: 0.36021568477153776


Train, Epoch 1 / 20:  19%|█▉        | 304/1563 [00:12<00:51, 24.34it/s]

batch 300 loss: 0.37414331883192065


Train, Epoch 1 / 20:  20%|██        | 313/1563 [00:13<00:51, 24.12it/s]

batch 310 loss: 0.29714060574769974


Train, Epoch 1 / 20:  21%|██        | 322/1563 [00:13<00:50, 24.49it/s]

batch 320 loss: 0.3030294470489025


Train, Epoch 1 / 20:  21%|██▏       | 334/1563 [00:13<00:50, 24.30it/s]

batch 330 loss: 0.3300422117114067


Train, Epoch 1 / 20:  22%|██▏       | 343/1563 [00:14<00:49, 24.60it/s]

batch 340 loss: 0.38443803191185


Train, Epoch 1 / 20:  23%|██▎       | 355/1563 [00:14<00:48, 24.67it/s]

batch 350 loss: 0.4077755518257618


Train, Epoch 1 / 20:  23%|██▎       | 364/1563 [00:15<00:49, 24.18it/s]

batch 360 loss: 0.31590947061777114


Train, Epoch 1 / 20:  24%|██▍       | 373/1563 [00:15<00:49, 24.01it/s]

batch 370 loss: 0.33724630028009417


Train, Epoch 1 / 20:  25%|██▍       | 385/1563 [00:16<00:48, 24.42it/s]

batch 380 loss: 0.33660186976194384


Train, Epoch 1 / 20:  25%|██▌       | 394/1563 [00:16<00:48, 24.33it/s]

batch 390 loss: 0.29330799765884874


Train, Epoch 1 / 20:  26%|██▌       | 403/1563 [00:16<00:47, 24.20it/s]

batch 400 loss: 0.26153324246406556


Train, Epoch 1 / 20:  26%|██▋       | 412/1563 [00:17<00:48, 23.93it/s]

batch 410 loss: 0.30333126783370973


Train, Epoch 1 / 20:  27%|██▋       | 424/1563 [00:17<00:47, 24.08it/s]

batch 420 loss: 0.4021012201905251


Train, Epoch 1 / 20:  28%|██▊       | 433/1563 [00:18<00:46, 24.30it/s]

batch 430 loss: 0.3447847247123718


Train, Epoch 1 / 20:  28%|██▊       | 442/1563 [00:18<00:45, 24.49it/s]

batch 440 loss: 0.3137070268392563


Train, Epoch 1 / 20:  29%|██▉       | 454/1563 [00:18<00:49, 22.38it/s]

batch 450 loss: 0.3252694010734558


Train, Epoch 1 / 20:  30%|██▉       | 463/1563 [00:19<00:51, 21.45it/s]

batch 460 loss: 0.2519796349108219


Train, Epoch 1 / 20:  30%|███       | 472/1563 [00:19<00:50, 21.71it/s]

batch 470 loss: 0.3557786226272583


Train, Epoch 1 / 20:  31%|███       | 484/1563 [00:20<00:49, 21.92it/s]

batch 480 loss: 0.3171763889491558


Train, Epoch 1 / 20:  32%|███▏      | 493/1563 [00:20<00:46, 22.87it/s]

batch 490 loss: 0.37291503995656966


Train, Epoch 1 / 20:  32%|███▏      | 505/1563 [00:21<00:43, 24.33it/s]

batch 500 loss: 0.30247472822666166


Train, Epoch 1 / 20:  33%|███▎      | 514/1563 [00:21<00:42, 24.67it/s]

batch 510 loss: 0.3437381699681282


Train, Epoch 1 / 20:  33%|███▎      | 523/1563 [00:21<00:42, 24.72it/s]

batch 520 loss: 0.44994084537029266


Train, Epoch 1 / 20:  34%|███▍      | 535/1563 [00:22<00:41, 24.71it/s]

batch 530 loss: 0.2904044911265373


Train, Epoch 1 / 20:  35%|███▍      | 544/1563 [00:22<00:41, 24.73it/s]

batch 540 loss: 0.39033285826444625


Train, Epoch 1 / 20:  35%|███▌      | 553/1563 [00:23<00:40, 24.77it/s]

batch 550 loss: 0.2542389616370201


Train, Epoch 1 / 20:  36%|███▌      | 565/1563 [00:23<00:40, 24.78it/s]

batch 560 loss: 0.34379192516207696


Train, Epoch 1 / 20:  37%|███▋      | 574/1563 [00:24<00:39, 24.81it/s]

batch 570 loss: 0.2881782576441765


Train, Epoch 1 / 20:  37%|███▋      | 583/1563 [00:24<00:40, 24.45it/s]

batch 580 loss: 0.2934445276856422


Train, Epoch 1 / 20:  38%|███▊      | 595/1563 [00:24<00:39, 24.82it/s]

batch 590 loss: 0.2998204782605171


Train, Epoch 1 / 20:  39%|███▊      | 604/1563 [00:25<00:38, 24.70it/s]

batch 600 loss: 0.35171882808208466


Train, Epoch 1 / 20:  39%|███▉      | 613/1563 [00:25<00:38, 24.54it/s]

batch 610 loss: 0.2853461816906929


Train, Epoch 1 / 20:  40%|███▉      | 625/1563 [00:26<00:37, 24.78it/s]

batch 620 loss: 0.31939870715141294


Train, Epoch 1 / 20:  41%|████      | 634/1563 [00:26<00:37, 24.58it/s]

batch 630 loss: 0.3280961111187935


Train, Epoch 1 / 20:  41%|████      | 643/1563 [00:26<00:37, 24.70it/s]

batch 640 loss: 0.2877744801342487


Train, Epoch 1 / 20:  42%|████▏     | 655/1563 [00:27<00:36, 24.69it/s]

batch 650 loss: 0.29163185358047483


Train, Epoch 1 / 20:  42%|████▏     | 664/1563 [00:27<00:36, 24.65it/s]

batch 660 loss: 0.2868714764714241


Train, Epoch 1 / 20:  43%|████▎     | 673/1563 [00:28<00:35, 24.73it/s]

batch 670 loss: 0.2774727404117584


Train, Epoch 1 / 20:  44%|████▎     | 682/1563 [00:28<00:35, 24.64it/s]

batch 680 loss: 0.44333207979798317


Train, Epoch 1 / 20:  44%|████▍     | 694/1563 [00:28<00:35, 24.75it/s]

batch 690 loss: 0.4046535685658455


Train, Epoch 1 / 20:  45%|████▍     | 703/1563 [00:29<00:34, 24.85it/s]

batch 700 loss: 0.35405546277761457


Train, Epoch 1 / 20:  46%|████▌     | 715/1563 [00:29<00:34, 24.74it/s]

batch 710 loss: 0.3324707716703415


Train, Epoch 1 / 20:  46%|████▋     | 724/1563 [00:30<00:33, 24.75it/s]

batch 720 loss: 0.4097690463066101


Train, Epoch 1 / 20:  47%|████▋     | 733/1563 [00:30<00:33, 24.55it/s]

batch 730 loss: 0.33953007832169535


Train, Epoch 1 / 20:  47%|████▋     | 742/1563 [00:30<00:35, 23.42it/s]

batch 740 loss: 0.40621945410966875


Train, Epoch 1 / 20:  48%|████▊     | 754/1563 [00:31<00:35, 22.51it/s]

batch 750 loss: 0.2905574679374695


Train, Epoch 1 / 20:  49%|████▉     | 763/1563 [00:31<00:36, 21.97it/s]

batch 760 loss: 0.3054989829659462


Train, Epoch 1 / 20:  49%|████▉     | 772/1563 [00:32<00:36, 21.67it/s]

batch 770 loss: 0.375459697842598


Train, Epoch 1 / 20:  50%|█████     | 784/1563 [00:32<00:34, 22.75it/s]

batch 780 loss: 0.351295205950737


Train, Epoch 1 / 20:  51%|█████     | 793/1563 [00:33<00:31, 24.08it/s]

batch 790 loss: 0.3416846811771393


Train, Epoch 1 / 20:  52%|█████▏    | 805/1563 [00:33<00:30, 24.61it/s]

batch 800 loss: 0.2914705917239189


Train, Epoch 1 / 20:  52%|█████▏    | 814/1563 [00:33<00:30, 24.75it/s]

batch 810 loss: 0.3629662379622459


Train, Epoch 1 / 20:  53%|█████▎    | 823/1563 [00:34<00:30, 24.59it/s]

batch 820 loss: 0.32235144823789597


Train, Epoch 1 / 20:  53%|█████▎    | 835/1563 [00:34<00:29, 24.75it/s]

batch 830 loss: 0.36822804510593415


Train, Epoch 1 / 20:  54%|█████▍    | 844/1563 [00:35<00:29, 24.79it/s]

batch 840 loss: 0.28015346080064774


Train, Epoch 1 / 20:  55%|█████▍    | 853/1563 [00:35<00:28, 24.70it/s]

batch 850 loss: 0.3456715628504753


Train, Epoch 1 / 20:  55%|█████▌    | 865/1563 [00:36<00:28, 24.76it/s]

batch 860 loss: 0.32784459441900254


Train, Epoch 1 / 20:  56%|█████▌    | 874/1563 [00:36<00:27, 24.83it/s]

batch 870 loss: 0.26245655566453935


Train, Epoch 1 / 20:  56%|█████▋    | 883/1563 [00:36<00:27, 24.75it/s]

batch 880 loss: 0.3034244626760483


Train, Epoch 1 / 20:  57%|█████▋    | 895/1563 [00:37<00:26, 24.78it/s]

batch 890 loss: 0.28563610315322874


Train, Epoch 1 / 20:  58%|█████▊    | 904/1563 [00:37<00:26, 25.01it/s]

batch 900 loss: 0.2823440782725811


Train, Epoch 1 / 20:  58%|█████▊    | 913/1563 [00:37<00:25, 25.24it/s]

batch 910 loss: 0.3564406827092171


Train, Epoch 1 / 20:  59%|█████▉    | 925/1563 [00:38<00:25, 24.74it/s]

batch 920 loss: 0.3407641306519508


Train, Epoch 1 / 20:  60%|█████▉    | 934/1563 [00:38<00:25, 24.37it/s]

batch 930 loss: 0.352698228508234


Train, Epoch 1 / 20:  60%|██████    | 943/1563 [00:39<00:24, 24.82it/s]

batch 940 loss: 0.3675050035119057


Train, Epoch 1 / 20:  61%|██████    | 955/1563 [00:39<00:24, 24.83it/s]

batch 950 loss: 0.27464827969670297


Train, Epoch 1 / 20:  62%|██████▏   | 964/1563 [00:40<00:24, 24.83it/s]

batch 960 loss: 0.2873018652200699


Train, Epoch 1 / 20:  62%|██████▏   | 973/1563 [00:40<00:23, 24.75it/s]

batch 970 loss: 0.3153077095746994


Train, Epoch 1 / 20:  63%|██████▎   | 982/1563 [00:40<00:23, 24.60it/s]

batch 980 loss: 0.36649033725261687


Train, Epoch 1 / 20:  64%|██████▎   | 994/1563 [00:41<00:23, 24.13it/s]

batch 990 loss: 0.29108971506357195


Train, Epoch 1 / 20:  64%|██████▍   | 1003/1563 [00:41<00:22, 24.55it/s]

batch 1000 loss: 0.35753636360168456


Train, Epoch 1 / 20:  65%|██████▍   | 1015/1563 [00:42<00:22, 24.57it/s]

batch 1010 loss: 0.42950029149651525


Train, Epoch 1 / 20:  66%|██████▌   | 1024/1563 [00:42<00:21, 24.90it/s]

batch 1020 loss: 0.27509209886193275


Train, Epoch 1 / 20:  66%|██████▌   | 1033/1563 [00:42<00:23, 22.23it/s]

batch 1030 loss: 0.3305599108338356


Train, Epoch 1 / 20:  67%|██████▋   | 1042/1563 [00:43<00:23, 21.73it/s]

batch 1040 loss: 0.3274805977940559


Train, Epoch 1 / 20:  67%|██████▋   | 1054/1563 [00:43<00:23, 22.04it/s]

batch 1050 loss: 0.29462156891822816


Train, Epoch 1 / 20:  68%|██████▊   | 1063/1563 [00:44<00:22, 21.78it/s]

batch 1060 loss: 0.3430894359946251


Train, Epoch 1 / 20:  69%|██████▊   | 1072/1563 [00:44<00:22, 21.61it/s]

batch 1070 loss: 0.3668249696493149


Train, Epoch 1 / 20:  69%|██████▉   | 1084/1563 [00:45<00:20, 23.80it/s]

batch 1080 loss: 0.36558069586753844


Train, Epoch 1 / 20:  70%|██████▉   | 1093/1563 [00:45<00:19, 24.33it/s]

batch 1090 loss: 0.3599484711885452


Train, Epoch 1 / 20:  71%|███████   | 1102/1563 [00:45<00:19, 24.06it/s]

batch 1100 loss: 0.271943187713623


Train, Epoch 1 / 20:  71%|███████▏  | 1114/1563 [00:46<00:18, 24.82it/s]

batch 1110 loss: 0.33389376997947695


Train, Epoch 1 / 20:  72%|███████▏  | 1123/1563 [00:46<00:17, 24.90it/s]

batch 1120 loss: 0.26954945623874665


Train, Epoch 1 / 20:  72%|███████▏  | 1132/1563 [00:47<00:17, 24.68it/s]

batch 1130 loss: 0.30702722668647764


Train, Epoch 1 / 20:  73%|███████▎  | 1144/1563 [00:47<00:17, 24.41it/s]

batch 1140 loss: 0.37493381053209307


Train, Epoch 1 / 20:  74%|███████▍  | 1153/1563 [00:48<00:16, 24.44it/s]

batch 1150 loss: 0.30817201137542727


Train, Epoch 1 / 20:  75%|███████▍  | 1165/1563 [00:48<00:16, 24.76it/s]

batch 1160 loss: 0.35261841118335724


Train, Epoch 1 / 20:  75%|███████▌  | 1174/1563 [00:48<00:15, 25.07it/s]

batch 1170 loss: 0.3860779538750648


Train, Epoch 1 / 20:  76%|███████▌  | 1183/1563 [00:49<00:15, 24.82it/s]

batch 1180 loss: 0.37089347243309023


Train, Epoch 1 / 20:  76%|███████▋  | 1195/1563 [00:49<00:14, 24.57it/s]

batch 1190 loss: 0.3613490015268326


Train, Epoch 1 / 20:  77%|███████▋  | 1204/1563 [00:50<00:14, 24.62it/s]

batch 1200 loss: 0.33307394534349444


Train, Epoch 1 / 20:  78%|███████▊  | 1213/1563 [00:50<00:14, 24.87it/s]

batch 1210 loss: 0.24489504098892212


Train, Epoch 1 / 20:  78%|███████▊  | 1225/1563 [00:50<00:13, 24.82it/s]

batch 1220 loss: 0.28318401277065275


Train, Epoch 1 / 20:  79%|███████▉  | 1234/1563 [00:51<00:13, 24.55it/s]

batch 1230 loss: 0.34165955856442454


Train, Epoch 1 / 20:  80%|███████▉  | 1243/1563 [00:51<00:12, 24.71it/s]

batch 1240 loss: 0.2905297502875328


Train, Epoch 1 / 20:  80%|████████  | 1255/1563 [00:52<00:12, 24.73it/s]

batch 1250 loss: 0.49748715162277224


Train, Epoch 1 / 20:  81%|████████  | 1264/1563 [00:52<00:12, 24.54it/s]

batch 1260 loss: 0.38706188797950747


Train, Epoch 1 / 20:  81%|████████▏ | 1273/1563 [00:52<00:11, 24.55it/s]

batch 1270 loss: 0.3975639700889587


Train, Epoch 1 / 20:  82%|████████▏ | 1285/1563 [00:53<00:11, 24.81it/s]

batch 1280 loss: 0.3620395168662071


Train, Epoch 1 / 20:  83%|████████▎ | 1294/1563 [00:53<00:10, 24.87it/s]

batch 1290 loss: 0.4437989205121994


Train, Epoch 1 / 20:  83%|████████▎ | 1303/1563 [00:54<00:10, 24.75it/s]

batch 1300 loss: 0.33056831285357474


Train, Epoch 1 / 20:  84%|████████▍ | 1312/1563 [00:54<00:10, 24.41it/s]

batch 1310 loss: 0.320328451693058


Train, Epoch 1 / 20:  85%|████████▍ | 1324/1563 [00:54<00:10, 23.52it/s]

batch 1320 loss: 0.31739068031311035


Train, Epoch 1 / 20:  85%|████████▌ | 1333/1563 [00:55<00:10, 22.23it/s]

batch 1330 loss: 0.35652573257684705


Train, Epoch 1 / 20:  86%|████████▌ | 1342/1563 [00:55<00:09, 22.37it/s]

batch 1340 loss: 0.3294738009572029


Train, Epoch 1 / 20:  87%|████████▋ | 1354/1563 [00:56<00:09, 21.42it/s]

batch 1350 loss: 0.29100749641656876


Train, Epoch 1 / 20:  87%|████████▋ | 1363/1563 [00:56<00:09, 20.88it/s]

batch 1360 loss: 0.3438190042972565


Train, Epoch 1 / 20:  88%|████████▊ | 1375/1563 [00:57<00:08, 23.39it/s]

batch 1370 loss: 0.3200289458036423


Train, Epoch 1 / 20:  89%|████████▊ | 1384/1563 [00:57<00:07, 24.37it/s]

batch 1380 loss: 0.2840874932706356


Train, Epoch 1 / 20:  89%|████████▉ | 1393/1563 [00:58<00:06, 24.66it/s]

batch 1390 loss: 0.34914787411689757


Train, Epoch 1 / 20:  90%|████████▉ | 1405/1563 [00:58<00:06, 24.99it/s]

batch 1400 loss: 0.39301468059420586


Train, Epoch 1 / 20:  90%|█████████ | 1414/1563 [00:58<00:06, 24.72it/s]

batch 1410 loss: 0.2919599786400795


Train, Epoch 1 / 20:  91%|█████████ | 1423/1563 [00:59<00:05, 24.40it/s]

batch 1420 loss: 0.3317485377192497


Train, Epoch 1 / 20:  92%|█████████▏| 1435/1563 [00:59<00:05, 24.72it/s]

batch 1430 loss: 0.4057007983326912


Train, Epoch 1 / 20:  92%|█████████▏| 1444/1563 [01:00<00:04, 24.50it/s]

batch 1440 loss: 0.3342812016606331


Train, Epoch 1 / 20:  93%|█████████▎| 1453/1563 [01:00<00:04, 24.70it/s]

batch 1450 loss: 0.4379023924469948


Train, Epoch 1 / 20:  94%|█████████▎| 1465/1563 [01:00<00:03, 24.60it/s]

batch 1460 loss: 0.34598219841718675


Train, Epoch 1 / 20:  94%|█████████▍| 1474/1563 [01:01<00:03, 23.90it/s]

batch 1470 loss: 0.28729773312807083


Train, Epoch 1 / 20:  95%|█████████▍| 1483/1563 [01:01<00:03, 24.50it/s]

batch 1480 loss: 0.3588501334190369


Train, Epoch 1 / 20:  96%|█████████▌| 1495/1563 [01:02<00:02, 24.66it/s]

batch 1490 loss: 0.3296908512711525


Train, Epoch 1 / 20:  96%|█████████▌| 1504/1563 [01:02<00:02, 24.67it/s]

batch 1500 loss: 0.3310796245932579


Train, Epoch 1 / 20:  97%|█████████▋| 1513/1563 [01:02<00:02, 24.50it/s]

batch 1510 loss: 0.39211954176425934


Train, Epoch 1 / 20:  97%|█████████▋| 1522/1563 [01:03<00:01, 23.87it/s]

batch 1520 loss: 0.347456394135952


Train, Epoch 1 / 20:  98%|█████████▊| 1534/1563 [01:03<00:01, 24.59it/s]

batch 1530 loss: 0.38183885961771014


Train, Epoch 1 / 20:  99%|█████████▊| 1543/1563 [01:04<00:00, 24.44it/s]

batch 1540 loss: 0.35556735545396806


Train, Epoch 1 / 20:  99%|█████████▉| 1552/1563 [01:04<00:00, 24.48it/s]

batch 1550 loss: 0.2927076943218708


Train, Epoch 1 / 20: 100%|██████████| 1563/1563 [01:04<00:00, 24.06it/s]


batch 1560 loss: 0.3718066304922104


Test, Epoch 1 / 20: 100%|██████████| 1563/1563 [00:29<00:00, 52.85it/s]


Epoch 1, loss: 0.43059260813772676, accuracy: 0.80596


Train, Epoch 2 / 20:   1%|          | 12/1563 [00:00<01:02, 24.99it/s]

batch 10 loss: 0.43686993718147277


Train, Epoch 2 / 20:   2%|▏         | 24/1563 [00:00<01:02, 24.59it/s]

batch 20 loss: 0.2856064051389694


Train, Epoch 2 / 20:   2%|▏         | 33/1563 [00:01<01:01, 24.71it/s]

batch 30 loss: 0.3329220965504646


Train, Epoch 2 / 20:   3%|▎         | 42/1563 [00:01<01:01, 24.71it/s]

batch 40 loss: 0.3473674848675728


Train, Epoch 2 / 20:   3%|▎         | 54/1563 [00:02<01:02, 24.21it/s]

batch 50 loss: 0.2824481979012489


Train, Epoch 2 / 20:   4%|▍         | 63/1563 [00:02<01:01, 24.49it/s]

batch 60 loss: 0.3712827518582344


Train, Epoch 2 / 20:   5%|▍         | 75/1563 [00:03<01:00, 24.70it/s]

batch 70 loss: 0.30649104714393616


Train, Epoch 2 / 20:   5%|▌         | 84/1563 [00:03<00:59, 25.01it/s]

batch 80 loss: 0.2542556770145893


Train, Epoch 2 / 20:   6%|▌         | 93/1563 [00:03<00:59, 24.52it/s]

batch 90 loss: 0.3672840163111687


Train, Epoch 2 / 20:   7%|▋         | 105/1563 [00:04<00:59, 24.47it/s]

batch 100 loss: 0.3568741902709007


Train, Epoch 2 / 20:   7%|▋         | 114/1563 [00:04<00:59, 24.18it/s]

batch 110 loss: 0.3229146361351013


Train, Epoch 2 / 20:   8%|▊         | 123/1563 [00:05<01:00, 23.87it/s]

batch 120 loss: 0.30989143550395964


Train, Epoch 2 / 20:   9%|▊         | 135/1563 [00:05<00:58, 24.42it/s]

batch 130 loss: 0.3941999450325966


Train, Epoch 2 / 20:   9%|▉         | 144/1563 [00:05<00:58, 24.37it/s]

batch 140 loss: 0.3813342034816742


Train, Epoch 2 / 20:  10%|▉         | 153/1563 [00:06<00:57, 24.69it/s]

batch 150 loss: 0.38916474878787993


Train, Epoch 2 / 20:  10%|█         | 162/1563 [00:06<00:58, 24.02it/s]

batch 160 loss: 0.28273397237062453


Train, Epoch 2 / 20:  11%|█         | 174/1563 [00:07<00:56, 24.38it/s]

batch 170 loss: 0.3656235247850418


Train, Epoch 2 / 20:  12%|█▏        | 183/1563 [00:07<00:56, 24.49it/s]

batch 180 loss: 0.338108591735363


Train, Epoch 2 / 20:  12%|█▏        | 195/1563 [00:07<00:55, 24.54it/s]

batch 190 loss: 0.316956102848053


Train, Epoch 2 / 20:  13%|█▎        | 204/1563 [00:08<00:55, 24.45it/s]

batch 200 loss: 0.33600242882966996


Train, Epoch 2 / 20:  14%|█▎        | 213/1563 [00:08<00:56, 23.92it/s]

batch 210 loss: 0.3041773110628128


Train, Epoch 2 / 20:  14%|█▍        | 222/1563 [00:09<00:59, 22.68it/s]

batch 220 loss: 0.313111712038517


Train, Epoch 2 / 20:  15%|█▍        | 234/1563 [00:09<01:01, 21.59it/s]

batch 230 loss: 0.29771034717559813


Train, Epoch 2 / 20:  16%|█▌        | 243/1563 [00:10<01:02, 20.98it/s]

batch 240 loss: 0.29671782404184344


Train, Epoch 2 / 20:  16%|█▌        | 252/1563 [00:10<01:02, 21.02it/s]

batch 250 loss: 0.32874417304992676


Train, Epoch 2 / 20:  17%|█▋        | 264/1563 [00:11<00:57, 22.76it/s]

batch 260 loss: 0.3113717630505562


Train, Epoch 2 / 20:  17%|█▋        | 273/1563 [00:11<00:54, 23.88it/s]

batch 270 loss: 0.34510254561901094


Train, Epoch 2 / 20:  18%|█▊        | 285/1563 [00:11<00:51, 24.59it/s]

batch 280 loss: 0.3128556728363037


Train, Epoch 2 / 20:  19%|█▉        | 294/1563 [00:12<00:52, 24.35it/s]

batch 290 loss: 0.3595954868942499


Train, Epoch 2 / 20:  19%|█▉        | 303/1563 [00:12<00:51, 24.43it/s]

batch 300 loss: 0.42731798738241195


Train, Epoch 2 / 20:  20%|█▉        | 312/1563 [00:13<00:52, 24.03it/s]

batch 310 loss: 0.4078137785196304


Train, Epoch 2 / 20:  21%|██        | 324/1563 [00:13<00:49, 24.83it/s]

batch 320 loss: 0.3670760035514832


Train, Epoch 2 / 20:  21%|██▏       | 333/1563 [00:13<00:49, 24.87it/s]

batch 330 loss: 0.32872607558965683


Train, Epoch 2 / 20:  22%|██▏       | 345/1563 [00:14<00:49, 24.58it/s]

batch 340 loss: 0.33114692717790606


Train, Epoch 2 / 20:  23%|██▎       | 354/1563 [00:14<00:50, 23.96it/s]

batch 350 loss: 0.38366842567920684


Train, Epoch 2 / 20:  23%|██▎       | 363/1563 [00:15<00:49, 24.15it/s]

batch 360 loss: 0.2667055331170559


Train, Epoch 2 / 20:  24%|██▍       | 375/1563 [00:15<00:47, 24.75it/s]

batch 370 loss: 0.31116136014461515


Train, Epoch 2 / 20:  25%|██▍       | 384/1563 [00:15<00:47, 24.67it/s]

batch 380 loss: 0.26209709569811823


Train, Epoch 2 / 20:  25%|██▌       | 393/1563 [00:16<00:47, 24.69it/s]

batch 390 loss: 0.29439336732029914


Train, Epoch 2 / 20:  26%|██▌       | 402/1563 [00:16<00:47, 24.37it/s]

batch 400 loss: 0.2662817224860191


Train, Epoch 2 / 20:  26%|██▋       | 414/1563 [00:17<00:47, 24.30it/s]

batch 410 loss: 0.32341189607977866


Train, Epoch 2 / 20:  27%|██▋       | 423/1563 [00:17<00:46, 24.65it/s]

batch 420 loss: 0.2763072565197945


Train, Epoch 2 / 20:  28%|██▊       | 435/1563 [00:18<00:45, 24.56it/s]

batch 430 loss: 0.23481523841619492


Train, Epoch 2 / 20:  28%|██▊       | 444/1563 [00:18<00:45, 24.49it/s]

batch 440 loss: 0.2385905921459198


Train, Epoch 2 / 20:  29%|██▉       | 453/1563 [00:18<00:44, 24.75it/s]

batch 450 loss: 0.27470550015568734


Train, Epoch 2 / 20:  30%|██▉       | 465/1563 [00:19<00:44, 24.65it/s]

batch 460 loss: 0.2934282571077347


Train, Epoch 2 / 20:  30%|███       | 474/1563 [00:19<00:44, 24.45it/s]

batch 470 loss: 0.2620144933462143


Train, Epoch 2 / 20:  31%|███       | 483/1563 [00:20<00:43, 24.66it/s]

batch 480 loss: 0.31321265250444413


Train, Epoch 2 / 20:  32%|███▏      | 495/1563 [00:20<00:43, 24.80it/s]

batch 490 loss: 0.3375547662377357


Train, Epoch 2 / 20:  32%|███▏      | 504/1563 [00:20<00:43, 24.51it/s]

batch 500 loss: 0.2977663815021515


Train, Epoch 2 / 20:  33%|███▎      | 513/1563 [00:21<00:47, 22.28it/s]

batch 510 loss: 0.30907254815101626


Train, Epoch 2 / 20:  33%|███▎      | 522/1563 [00:21<00:48, 21.53it/s]

batch 520 loss: 0.3547264412045479


Train, Epoch 2 / 20:  34%|███▍      | 534/1563 [00:22<00:48, 21.33it/s]

batch 530 loss: 0.33194275498390197


Train, Epoch 2 / 20:  35%|███▍      | 543/1563 [00:22<00:47, 21.31it/s]

batch 540 loss: 0.24862563461065293


Train, Epoch 2 / 20:  35%|███▌      | 552/1563 [00:23<00:45, 22.40it/s]

batch 550 loss: 0.3516828939318657


Train, Epoch 2 / 20:  36%|███▌      | 564/1563 [00:23<00:41, 24.00it/s]

batch 560 loss: 0.2809456646442413


Train, Epoch 2 / 20:  37%|███▋      | 573/1563 [00:23<00:40, 24.38it/s]

batch 570 loss: 0.4576829522848129


Train, Epoch 2 / 20:  37%|███▋      | 585/1563 [00:24<00:39, 24.63it/s]

batch 580 loss: 0.3287199065089226


Train, Epoch 2 / 20:  38%|███▊      | 594/1563 [00:24<00:39, 24.84it/s]

batch 590 loss: 0.3254243716597557


Train, Epoch 2 / 20:  39%|███▊      | 603/1563 [00:25<00:38, 24.69it/s]

batch 600 loss: 0.27958313226699827


Train, Epoch 2 / 20:  39%|███▉      | 615/1563 [00:25<00:38, 24.80it/s]

batch 610 loss: 0.3949595019221306


Train, Epoch 2 / 20:  40%|███▉      | 624/1563 [00:26<00:38, 24.38it/s]

batch 620 loss: 0.2906356886029243


Train, Epoch 2 / 20:  40%|████      | 633/1563 [00:26<00:38, 24.04it/s]

batch 630 loss: 0.27604137659072875


Train, Epoch 2 / 20:  41%|████▏     | 645/1563 [00:26<00:37, 24.43it/s]

batch 640 loss: 0.3714238092303276


Train, Epoch 2 / 20:  42%|████▏     | 654/1563 [00:27<00:36, 24.57it/s]

batch 650 loss: 0.29639111235737803


Train, Epoch 2 / 20:  42%|████▏     | 663/1563 [00:27<00:36, 24.81it/s]

batch 660 loss: 0.37027700543403624


Train, Epoch 2 / 20:  43%|████▎     | 672/1563 [00:28<00:36, 24.54it/s]

batch 670 loss: 0.3277878999710083


Train, Epoch 2 / 20:  44%|████▍     | 684/1563 [00:28<00:36, 24.32it/s]

batch 680 loss: 0.27638421058654783


Train, Epoch 2 / 20:  44%|████▍     | 693/1563 [00:28<00:35, 24.19it/s]

batch 690 loss: 0.3006404221057892


Train, Epoch 2 / 20:  45%|████▌     | 705/1563 [00:29<00:35, 24.44it/s]

batch 700 loss: 0.2804295033216476


Train, Epoch 2 / 20:  46%|████▌     | 714/1563 [00:29<00:34, 24.60it/s]

batch 710 loss: 0.3698600172996521


Train, Epoch 2 / 20:  46%|████▋     | 723/1563 [00:30<00:34, 24.45it/s]

batch 720 loss: 0.3577371180057526


Train, Epoch 2 / 20:  47%|████▋     | 732/1563 [00:30<00:34, 23.78it/s]

batch 730 loss: 0.2615091517567635


Train, Epoch 2 / 20:  48%|████▊     | 744/1563 [00:30<00:33, 24.48it/s]

batch 740 loss: 0.3958165466785431


Train, Epoch 2 / 20:  48%|████▊     | 753/1563 [00:31<00:32, 24.72it/s]

batch 750 loss: 0.361165377497673


Train, Epoch 2 / 20:  49%|████▉     | 765/1563 [00:31<00:32, 24.76it/s]

batch 760 loss: 0.2900048330426216


Train, Epoch 2 / 20:  50%|████▉     | 774/1563 [00:32<00:31, 24.80it/s]

batch 770 loss: 0.3053305711597204


Train, Epoch 2 / 20:  50%|█████     | 783/1563 [00:32<00:31, 24.75it/s]

batch 780 loss: 0.3525292694568634


Train, Epoch 2 / 20:  51%|█████     | 792/1563 [00:32<00:31, 24.19it/s]

batch 790 loss: 0.30516262724995613


Train, Epoch 2 / 20:  51%|█████▏    | 804/1563 [00:33<00:34, 22.19it/s]

batch 800 loss: 0.321744467318058


Train, Epoch 2 / 20:  52%|█████▏    | 813/1563 [00:33<00:33, 22.25it/s]

batch 810 loss: 0.2834912955760956


Train, Epoch 2 / 20:  53%|█████▎    | 822/1563 [00:34<00:33, 22.02it/s]

batch 820 loss: 0.3993960365653038


Train, Epoch 2 / 20:  53%|█████▎    | 834/1563 [00:34<00:34, 21.18it/s]

batch 830 loss: 0.2969891279935837


Train, Epoch 2 / 20:  54%|█████▍    | 843/1563 [00:35<00:31, 22.72it/s]

batch 840 loss: 0.29567896090447904


Train, Epoch 2 / 20:  55%|█████▍    | 852/1563 [00:35<00:29, 23.89it/s]

batch 850 loss: 0.320771611481905


Train, Epoch 2 / 20:  55%|█████▌    | 864/1563 [00:36<00:28, 24.58it/s]

batch 860 loss: 0.40155435502529147


Train, Epoch 2 / 20:  56%|█████▌    | 873/1563 [00:36<00:28, 24.60it/s]

batch 870 loss: 0.3432561635971069


Train, Epoch 2 / 20:  57%|█████▋    | 885/1563 [00:36<00:27, 24.82it/s]

batch 880 loss: 0.2738906741142273


Train, Epoch 2 / 20:  57%|█████▋    | 894/1563 [00:37<00:26, 24.79it/s]

batch 890 loss: 0.30052113234996797


Train, Epoch 2 / 20:  58%|█████▊    | 903/1563 [00:37<00:26, 24.70it/s]

batch 900 loss: 0.2549113541841507


Train, Epoch 2 / 20:  58%|█████▊    | 912/1563 [00:38<00:26, 24.70it/s]

batch 910 loss: 0.4305694460868835


Train, Epoch 2 / 20:  59%|█████▉    | 924/1563 [00:38<00:25, 24.61it/s]

batch 920 loss: 0.2855188138782978


Train, Epoch 2 / 20:  60%|█████▉    | 933/1563 [00:38<00:25, 24.71it/s]

batch 930 loss: 0.3404510527849197


Train, Epoch 2 / 20:  60%|██████    | 942/1563 [00:39<00:25, 24.71it/s]

batch 940 loss: 0.3199041813611984


Train, Epoch 2 / 20:  61%|██████    | 954/1563 [00:39<00:24, 24.59it/s]

batch 950 loss: 0.3163865938782692


Train, Epoch 2 / 20:  62%|██████▏   | 963/1563 [00:40<00:24, 24.75it/s]

batch 960 loss: 0.35762783885002136


Train, Epoch 2 / 20:  62%|██████▏   | 975/1563 [00:40<00:23, 24.73it/s]

batch 970 loss: 0.2895986527204514


Train, Epoch 2 / 20:  63%|██████▎   | 984/1563 [00:40<00:23, 24.88it/s]

batch 980 loss: 0.3704542741179466


Train, Epoch 2 / 20:  64%|██████▎   | 993/1563 [00:41<00:23, 24.68it/s]

batch 990 loss: 0.3439012922346592


Train, Epoch 2 / 20:  64%|██████▍   | 1002/1563 [00:41<00:23, 24.12it/s]

batch 1000 loss: 0.41138105243444445


Train, Epoch 2 / 20:  65%|██████▍   | 1014/1563 [00:42<00:22, 24.57it/s]

batch 1010 loss: 0.34657002389431


Train, Epoch 2 / 20:  65%|██████▌   | 1023/1563 [00:42<00:22, 24.32it/s]

batch 1020 loss: 0.34935481250286105


Train, Epoch 2 / 20:  66%|██████▌   | 1032/1563 [00:42<00:21, 24.25it/s]

batch 1030 loss: 0.28446606919169426


Train, Epoch 2 / 20:  67%|██████▋   | 1044/1563 [00:43<00:21, 23.98it/s]

batch 1040 loss: 0.29422907829284667


Train, Epoch 2 / 20:  67%|██████▋   | 1053/1563 [00:43<00:20, 24.52it/s]

batch 1050 loss: 0.2946022152900696


Train, Epoch 2 / 20:  68%|██████▊   | 1065/1563 [00:44<00:20, 24.79it/s]

batch 1060 loss: 0.3207365214824677


Train, Epoch 2 / 20:  69%|██████▊   | 1074/1563 [00:44<00:19, 24.54it/s]

batch 1070 loss: 0.3056363955140114


Train, Epoch 2 / 20:  69%|██████▉   | 1083/1563 [00:45<00:20, 23.80it/s]

batch 1080 loss: 0.41330641955137254


Train, Epoch 2 / 20:  70%|██████▉   | 1092/1563 [00:45<00:21, 22.10it/s]

batch 1090 loss: 0.2600709691643715


Train, Epoch 2 / 20:  71%|███████   | 1104/1563 [00:46<00:20, 22.01it/s]

batch 1100 loss: 0.3049697533249855


Train, Epoch 2 / 20:  71%|███████   | 1113/1563 [00:46<00:20, 22.00it/s]

batch 1110 loss: 0.2953291229903698


Train, Epoch 2 / 20:  72%|███████▏  | 1122/1563 [00:46<00:20, 21.11it/s]

batch 1120 loss: 0.33544220179319384


Train, Epoch 2 / 20:  73%|███████▎  | 1134/1563 [00:47<00:18, 22.69it/s]

batch 1130 loss: 0.27087654545903206


Train, Epoch 2 / 20:  73%|███████▎  | 1143/1563 [00:47<00:17, 23.99it/s]

batch 1140 loss: 0.35633694380521774


Train, Epoch 2 / 20:  74%|███████▍  | 1155/1563 [00:48<00:16, 24.30it/s]

batch 1150 loss: 0.2832542508840561


Train, Epoch 2 / 20:  74%|███████▍  | 1164/1563 [00:48<00:16, 24.59it/s]

batch 1160 loss: 0.2859543666243553


Train, Epoch 2 / 20:  75%|███████▌  | 1173/1563 [00:49<00:16, 24.32it/s]

batch 1170 loss: 0.3485731169581413


Train, Epoch 2 / 20:  76%|███████▌  | 1182/1563 [00:49<00:15, 23.97it/s]

batch 1180 loss: 0.28942266702651975


Train, Epoch 2 / 20:  76%|███████▋  | 1194/1563 [00:49<00:15, 24.24it/s]

batch 1190 loss: 0.36150852739810946


Train, Epoch 2 / 20:  77%|███████▋  | 1203/1563 [00:50<00:14, 24.38it/s]

batch 1200 loss: 0.3404958724975586


Train, Epoch 2 / 20:  78%|███████▊  | 1215/1563 [00:50<00:14, 24.78it/s]

batch 1210 loss: 0.38144680857658386


Train, Epoch 2 / 20:  78%|███████▊  | 1224/1563 [00:51<00:13, 24.41it/s]

batch 1220 loss: 0.3646652281284332


Train, Epoch 2 / 20:  79%|███████▉  | 1233/1563 [00:51<00:13, 24.27it/s]

batch 1230 loss: 0.38996029198169707


Train, Epoch 2 / 20:  80%|███████▉  | 1245/1563 [00:51<00:12, 24.56it/s]

batch 1240 loss: 0.3499880224466324


Train, Epoch 2 / 20:  80%|████████  | 1254/1563 [00:52<00:12, 24.67it/s]

batch 1250 loss: 0.3253841243684292


Train, Epoch 2 / 20:  81%|████████  | 1263/1563 [00:52<00:12, 24.66it/s]

batch 1260 loss: 0.33343545347452164


Train, Epoch 2 / 20:  81%|████████▏ | 1272/1563 [00:53<00:11, 24.43it/s]

batch 1270 loss: 0.35840228348970415


Train, Epoch 2 / 20:  82%|████████▏ | 1284/1563 [00:53<00:11, 24.74it/s]

batch 1280 loss: 0.3707777768373489


Train, Epoch 2 / 20:  83%|████████▎ | 1293/1563 [00:53<00:10, 24.90it/s]

batch 1290 loss: 0.24701641798019408


Train, Epoch 2 / 20:  83%|████████▎ | 1305/1563 [00:54<00:10, 24.74it/s]

batch 1300 loss: 0.38035951405763624


Train, Epoch 2 / 20:  84%|████████▍ | 1314/1563 [00:54<00:10, 24.63it/s]

batch 1310 loss: 0.30887592434883115


Train, Epoch 2 / 20:  85%|████████▍ | 1323/1563 [00:55<00:09, 24.69it/s]

batch 1320 loss: 0.3214855924248695


Train, Epoch 2 / 20:  85%|████████▌ | 1335/1563 [00:55<00:09, 24.74it/s]

batch 1330 loss: 0.26360511481761933


Train, Epoch 2 / 20:  86%|████████▌ | 1344/1563 [00:55<00:08, 24.50it/s]

batch 1340 loss: 0.2708613224327564


Train, Epoch 2 / 20:  87%|████████▋ | 1353/1563 [00:56<00:08, 24.73it/s]

batch 1350 loss: 0.3929178208112717


Train, Epoch 2 / 20:  87%|████████▋ | 1365/1563 [00:56<00:07, 25.04it/s]

batch 1360 loss: 0.3524350456893444


Train, Epoch 2 / 20:  88%|████████▊ | 1374/1563 [00:57<00:07, 24.29it/s]

batch 1370 loss: 0.39941706210374833


Train, Epoch 2 / 20:  88%|████████▊ | 1383/1563 [00:57<00:07, 22.57it/s]

batch 1380 loss: 0.3118152476847172


Train, Epoch 2 / 20:  89%|████████▉ | 1392/1563 [00:58<00:07, 22.40it/s]

batch 1390 loss: 0.30593269020318986


Train, Epoch 2 / 20:  90%|████████▉ | 1404/1563 [00:58<00:07, 21.87it/s]

batch 1400 loss: 0.3442560777068138


Train, Epoch 2 / 20:  90%|█████████ | 1413/1563 [00:58<00:06, 21.87it/s]

batch 1410 loss: 0.3000264331698418


Train, Epoch 2 / 20:  91%|█████████ | 1425/1563 [00:59<00:05, 23.20it/s]

batch 1420 loss: 0.3240217551589012


Train, Epoch 2 / 20:  92%|█████████▏| 1434/1563 [00:59<00:05, 24.24it/s]

batch 1430 loss: 0.2731920272111893


Train, Epoch 2 / 20:  92%|█████████▏| 1443/1563 [01:00<00:04, 24.25it/s]

batch 1440 loss: 0.3236252129077911


Train, Epoch 2 / 20:  93%|█████████▎| 1455/1563 [01:00<00:04, 24.87it/s]

batch 1450 loss: 0.2931600347161293


Train, Epoch 2 / 20:  94%|█████████▎| 1464/1563 [01:01<00:03, 24.93it/s]

batch 1460 loss: 0.31293133795261385


Train, Epoch 2 / 20:  94%|█████████▍| 1473/1563 [01:01<00:03, 24.55it/s]

batch 1470 loss: 0.3832743287086487


Train, Epoch 2 / 20:  95%|█████████▌| 1485/1563 [01:01<00:03, 24.70it/s]

batch 1480 loss: 0.3481237590312958


Train, Epoch 2 / 20:  96%|█████████▌| 1494/1563 [01:02<00:02, 24.85it/s]

batch 1490 loss: 0.3062587797641754


Train, Epoch 2 / 20:  96%|█████████▌| 1503/1563 [01:02<00:02, 24.87it/s]

batch 1500 loss: 0.2975587897002697


Train, Epoch 2 / 20:  97%|█████████▋| 1515/1563 [01:03<00:01, 25.01it/s]

batch 1510 loss: 0.32166707813739776


Train, Epoch 2 / 20:  98%|█████████▊| 1524/1563 [01:03<00:01, 24.39it/s]

batch 1520 loss: 0.27075456231832506


Train, Epoch 2 / 20:  98%|█████████▊| 1533/1563 [01:03<00:01, 24.82it/s]

batch 1530 loss: 0.2677240177989006


Train, Epoch 2 / 20:  99%|█████████▉| 1545/1563 [01:04<00:00, 24.87it/s]

batch 1540 loss: 0.32490474358201027


Train, Epoch 2 / 20:  99%|█████████▉| 1554/1563 [01:04<00:00, 24.88it/s]

batch 1550 loss: 0.31853189766407014


Train, Epoch 2 / 20: 100%|██████████| 1563/1563 [01:05<00:00, 24.02it/s]


batch 1560 loss: 0.33365279585123064


Test, Epoch 2 / 20: 100%|██████████| 1563/1563 [00:29<00:00, 52.75it/s]


Epoch 2, loss: 0.4618800037497282, accuracy: 0.79936


Train, Epoch 3 / 20:   1%|          | 12/1563 [00:00<01:11, 21.83it/s]

batch 10 loss: 0.3612811520695686


Train, Epoch 3 / 20:   2%|▏         | 24/1563 [00:01<01:13, 20.98it/s]

batch 20 loss: 0.3021635740995407


Train, Epoch 3 / 20:   2%|▏         | 33/1563 [00:01<01:05, 23.30it/s]

batch 30 loss: 0.23443946987390518


Train, Epoch 3 / 20:   3%|▎         | 42/1563 [00:01<01:03, 23.86it/s]

batch 40 loss: 0.3039336487650871


Train, Epoch 3 / 20:   3%|▎         | 54/1563 [00:02<01:01, 24.57it/s]

batch 50 loss: 0.2900997221469879


Train, Epoch 3 / 20:   4%|▍         | 63/1563 [00:02<01:01, 24.41it/s]

batch 60 loss: 0.31989234164357183


Train, Epoch 3 / 20:   5%|▍         | 75/1563 [00:03<01:00, 24.64it/s]

batch 70 loss: 0.2808642238378525


Train, Epoch 3 / 20:   5%|▌         | 84/1563 [00:03<00:59, 24.77it/s]

batch 80 loss: 0.3438254475593567


Train, Epoch 3 / 20:   6%|▌         | 93/1563 [00:03<00:59, 24.76it/s]

batch 90 loss: 0.31006359606981276


Train, Epoch 3 / 20:   7%|▋         | 105/1563 [00:04<00:58, 24.74it/s]

batch 100 loss: 0.3306683346629143


Train, Epoch 3 / 20:   7%|▋         | 114/1563 [00:04<00:58, 24.67it/s]

batch 110 loss: 0.3109341710805893


Train, Epoch 3 / 20:   8%|▊         | 123/1563 [00:05<00:58, 24.56it/s]

batch 120 loss: 0.3469917833805084


Train, Epoch 3 / 20:   8%|▊         | 132/1563 [00:05<00:58, 24.62it/s]

batch 130 loss: 0.28828305974602697


Train, Epoch 3 / 20:   9%|▉         | 144/1563 [00:06<00:57, 24.62it/s]

batch 140 loss: 0.2819612056016922


Train, Epoch 3 / 20:  10%|▉         | 153/1563 [00:06<00:57, 24.36it/s]

batch 150 loss: 0.33430557399988176


Train, Epoch 3 / 20:  11%|█         | 165/1563 [00:06<00:57, 24.50it/s]

batch 160 loss: 0.31617102921009066


Train, Epoch 3 / 20:  11%|█         | 174/1563 [00:07<00:56, 24.68it/s]

batch 170 loss: 0.3109236598014832


Train, Epoch 3 / 20:  12%|█▏        | 183/1563 [00:07<00:55, 24.75it/s]

batch 180 loss: 0.34172007739543914


Train, Epoch 3 / 20:  12%|█▏        | 195/1563 [00:08<00:55, 24.61it/s]

batch 190 loss: 0.2767410188913345


Train, Epoch 3 / 20:  13%|█▎        | 204/1563 [00:08<00:54, 24.90it/s]

batch 200 loss: 0.3549062669277191


Train, Epoch 3 / 20:  14%|█▎        | 213/1563 [00:08<00:54, 24.75it/s]

batch 210 loss: 0.3025678895413876


Train, Epoch 3 / 20:  14%|█▍        | 225/1563 [00:09<00:53, 24.82it/s]

batch 220 loss: 0.2647088892757893


Train, Epoch 3 / 20:  15%|█▍        | 234/1563 [00:09<00:53, 24.75it/s]

batch 230 loss: 0.36846083253622053


Train, Epoch 3 / 20:  16%|█▌        | 243/1563 [00:10<00:53, 24.57it/s]

batch 240 loss: 0.2985044106841087


Train, Epoch 3 / 20:  16%|█▋        | 255/1563 [00:10<00:52, 24.74it/s]

batch 250 loss: 0.3113438993692398


Train, Epoch 3 / 20:  17%|█▋        | 264/1563 [00:10<00:52, 24.79it/s]

batch 260 loss: 0.32415861189365386


Train, Epoch 3 / 20:  17%|█▋        | 273/1563 [00:11<00:54, 23.81it/s]

batch 270 loss: 0.29124501198530195


Train, Epoch 3 / 20:  18%|█▊        | 282/1563 [00:11<00:56, 22.51it/s]

batch 280 loss: 0.2928178787231445


Train, Epoch 3 / 20:  19%|█▉        | 294/1563 [00:12<00:58, 21.84it/s]

batch 290 loss: 0.35746983587741854


Train, Epoch 3 / 20:  19%|█▉        | 303/1563 [00:12<00:57, 22.00it/s]

batch 300 loss: 0.3146261438727379


Train, Epoch 3 / 20:  20%|█▉        | 312/1563 [00:13<00:57, 21.83it/s]

batch 310 loss: 0.26596259921789167


Train, Epoch 3 / 20:  21%|██        | 324/1563 [00:13<00:53, 23.03it/s]

batch 320 loss: 0.21799768023192884


Train, Epoch 3 / 20:  21%|██▏       | 333/1563 [00:13<00:51, 23.72it/s]

batch 330 loss: 0.37586979269981385


Train, Epoch 3 / 20:  22%|██▏       | 345/1563 [00:14<00:49, 24.41it/s]

batch 340 loss: 0.36933417320251466


Train, Epoch 3 / 20:  23%|██▎       | 354/1563 [00:14<00:49, 24.60it/s]

batch 350 loss: 0.32284086644649507


Train, Epoch 3 / 20:  23%|██▎       | 363/1563 [00:15<00:48, 24.50it/s]

batch 360 loss: 0.30929277688264845


Train, Epoch 3 / 20:  24%|██▍       | 375/1563 [00:15<00:47, 24.84it/s]

batch 370 loss: 0.34589322060346606


Train, Epoch 3 / 20:  25%|██▍       | 384/1563 [00:16<00:47, 24.66it/s]

batch 380 loss: 0.3166315644979477


Train, Epoch 3 / 20:  25%|██▌       | 393/1563 [00:16<00:47, 24.69it/s]

batch 390 loss: 0.23420992866158485


Train, Epoch 3 / 20:  26%|██▌       | 405/1563 [00:16<00:46, 24.76it/s]

batch 400 loss: 0.3877770289778709


Train, Epoch 3 / 20:  26%|██▋       | 414/1563 [00:17<00:46, 24.82it/s]

batch 410 loss: 0.2504273973405361


Train, Epoch 3 / 20:  27%|██▋       | 423/1563 [00:17<00:46, 24.69it/s]

batch 420 loss: 0.3162396363914013


Train, Epoch 3 / 20:  28%|██▊       | 432/1563 [00:17<00:46, 24.40it/s]

batch 430 loss: 0.4026048943400383


Train, Epoch 3 / 20:  28%|██▊       | 444/1563 [00:18<00:45, 24.75it/s]

batch 440 loss: 0.41327731907367704


Train, Epoch 3 / 20:  29%|██▉       | 453/1563 [00:18<00:45, 24.62it/s]

batch 450 loss: 0.3054276406764984


Train, Epoch 3 / 20:  30%|██▉       | 465/1563 [00:19<00:44, 24.58it/s]

batch 460 loss: 0.3260527357459068


Train, Epoch 3 / 20:  30%|███       | 474/1563 [00:19<00:43, 24.79it/s]

batch 470 loss: 0.29351040720939636


Train, Epoch 3 / 20:  31%|███       | 483/1563 [00:20<00:44, 24.35it/s]

batch 480 loss: 0.3652501180768013


Train, Epoch 3 / 20:  32%|███▏      | 495/1563 [00:20<00:43, 24.50it/s]

batch 490 loss: 0.28322947323322295


Train, Epoch 3 / 20:  32%|███▏      | 504/1563 [00:20<00:43, 24.48it/s]

batch 500 loss: 0.4385853037238121


Train, Epoch 3 / 20:  33%|███▎      | 513/1563 [00:21<00:42, 24.56it/s]

batch 510 loss: 0.2531095691025257


Train, Epoch 3 / 20:  34%|███▎      | 525/1563 [00:21<00:41, 24.72it/s]

batch 520 loss: 0.3094510644674301


Train, Epoch 3 / 20:  34%|███▍      | 534/1563 [00:22<00:41, 24.55it/s]

batch 530 loss: 0.30873587876558306


Train, Epoch 3 / 20:  35%|███▍      | 543/1563 [00:22<00:41, 24.63it/s]

batch 540 loss: 0.32180910408496854


Train, Epoch 3 / 20:  36%|███▌      | 555/1563 [00:22<00:40, 24.89it/s]

batch 550 loss: 0.3552399627864361


Train, Epoch 3 / 20:  36%|███▌      | 564/1563 [00:23<00:41, 24.32it/s]

batch 560 loss: 0.40284943357110026


Train, Epoch 3 / 20:  37%|███▋      | 573/1563 [00:23<00:42, 23.03it/s]

batch 570 loss: 0.3276021771132946


Train, Epoch 3 / 20:  37%|███▋      | 582/1563 [00:24<00:44, 22.12it/s]

batch 580 loss: 0.3794849470257759


Train, Epoch 3 / 20:  38%|███▊      | 594/1563 [00:24<00:44, 21.83it/s]

batch 590 loss: 0.3089323937892914


Train, Epoch 3 / 20:  39%|███▊      | 603/1563 [00:25<00:45, 21.17it/s]

batch 600 loss: 0.28576742559671403


Train, Epoch 3 / 20:  39%|███▉      | 615/1563 [00:25<00:41, 22.71it/s]

batch 610 loss: 0.28227822110056877


Train, Epoch 3 / 20:  40%|███▉      | 624/1563 [00:26<00:39, 23.90it/s]

batch 620 loss: 0.2893090158700943


Train, Epoch 3 / 20:  40%|████      | 633/1563 [00:26<00:38, 24.07it/s]

batch 630 loss: 0.3392844468355179


Train, Epoch 3 / 20:  41%|████▏     | 645/1563 [00:26<00:37, 24.23it/s]

batch 640 loss: 0.2731360301375389


Train, Epoch 3 / 20:  42%|████▏     | 654/1563 [00:27<00:37, 24.32it/s]

batch 650 loss: 0.40006336718797686


Train, Epoch 3 / 20:  42%|████▏     | 663/1563 [00:27<00:36, 24.54it/s]

batch 660 loss: 0.42566275745630267


Train, Epoch 3 / 20:  43%|████▎     | 675/1563 [00:28<00:35, 24.69it/s]

batch 670 loss: 0.24665458053350447


Train, Epoch 3 / 20:  44%|████▍     | 684/1563 [00:28<00:35, 24.52it/s]

batch 680 loss: 0.4303412616252899


Train, Epoch 3 / 20:  44%|████▍     | 693/1563 [00:28<00:36, 24.06it/s]

batch 690 loss: 0.3093492701649666


Train, Epoch 3 / 20:  45%|████▍     | 702/1563 [00:29<00:35, 24.41it/s]

batch 700 loss: 0.35091327130794525


Train, Epoch 3 / 20:  46%|████▌     | 714/1563 [00:29<00:34, 24.59it/s]

batch 710 loss: 0.3815924659371376


Train, Epoch 3 / 20:  46%|████▋     | 723/1563 [00:30<00:34, 24.67it/s]

batch 720 loss: 0.27344470769166945


Train, Epoch 3 / 20:  47%|████▋     | 735/1563 [00:30<00:33, 24.72it/s]

batch 730 loss: 0.3578189998865128


Train, Epoch 3 / 20:  48%|████▊     | 744/1563 [00:30<00:33, 24.72it/s]

batch 740 loss: 0.3365393102169037


Train, Epoch 3 / 20:  48%|████▊     | 753/1563 [00:31<00:34, 23.82it/s]

batch 750 loss: 0.25519910007715224


Train, Epoch 3 / 20:  49%|████▉     | 762/1563 [00:31<00:32, 24.29it/s]

batch 760 loss: 0.2952660322189331


Train, Epoch 3 / 20:  50%|████▉     | 774/1563 [00:32<00:32, 24.30it/s]

batch 770 loss: 0.340319462120533


Train, Epoch 3 / 20:  50%|█████     | 783/1563 [00:32<00:32, 24.32it/s]

batch 780 loss: 0.31554832458496096


Train, Epoch 3 / 20:  51%|█████     | 795/1563 [00:33<00:31, 24.58it/s]

batch 790 loss: 0.2985275939106941


Train, Epoch 3 / 20:  51%|█████▏    | 804/1563 [00:33<00:30, 24.56it/s]

batch 800 loss: 0.3077985063195229


Train, Epoch 3 / 20:  52%|█████▏    | 813/1563 [00:33<00:30, 24.56it/s]

batch 810 loss: 0.28665361031889913


Train, Epoch 3 / 20:  53%|█████▎    | 825/1563 [00:34<00:29, 24.67it/s]

batch 820 loss: 0.3243150696158409


Train, Epoch 3 / 20:  53%|█████▎    | 834/1563 [00:34<00:29, 24.65it/s]

batch 830 loss: 0.2197428748011589


Train, Epoch 3 / 20:  54%|█████▍    | 843/1563 [00:35<00:29, 24.19it/s]

batch 840 loss: 0.3076521523296833


Train, Epoch 3 / 20:  55%|█████▍    | 852/1563 [00:35<00:29, 24.27it/s]

batch 850 loss: 0.31809090822935104


Train, Epoch 3 / 20:  55%|█████▌    | 864/1563 [00:35<00:30, 22.63it/s]

batch 860 loss: 0.31695033609867096


Train, Epoch 3 / 20:  56%|█████▌    | 873/1563 [00:36<00:31, 22.12it/s]

batch 870 loss: 0.252034392952919


Train, Epoch 3 / 20:  56%|█████▋    | 882/1563 [00:36<00:31, 21.68it/s]

batch 880 loss: 0.29783681482076646


Train, Epoch 3 / 20:  57%|█████▋    | 894/1563 [00:37<00:30, 21.78it/s]

batch 890 loss: 0.33944964706897734


Train, Epoch 3 / 20:  58%|█████▊    | 903/1563 [00:37<00:29, 22.73it/s]

batch 900 loss: 0.3385383188724518


Train, Epoch 3 / 20:  59%|█████▊    | 915/1563 [00:38<00:26, 24.15it/s]

batch 910 loss: 0.3590881258249283


Train, Epoch 3 / 20:  59%|█████▉    | 924/1563 [00:38<00:26, 24.40it/s]

batch 920 loss: 0.28319277316331865


Train, Epoch 3 / 20:  60%|█████▉    | 933/1563 [00:38<00:25, 24.70it/s]

batch 930 loss: 0.3613795228302479


Train, Epoch 3 / 20:  60%|██████    | 945/1563 [00:39<00:25, 24.68it/s]

batch 940 loss: 0.29613447189331055


Train, Epoch 3 / 20:  61%|██████    | 954/1563 [00:39<00:24, 24.68it/s]

batch 950 loss: 0.29387338310480116


Train, Epoch 3 / 20:  62%|██████▏   | 963/1563 [00:40<00:24, 24.77it/s]

batch 960 loss: 0.28716719299554827


Train, Epoch 3 / 20:  62%|██████▏   | 972/1563 [00:40<00:23, 24.66it/s]

batch 970 loss: 0.3250977858901024


Train, Epoch 3 / 20:  63%|██████▎   | 984/1563 [00:41<00:23, 24.83it/s]

batch 980 loss: 0.3319790229201317


Train, Epoch 3 / 20:  64%|██████▎   | 993/1563 [00:41<00:23, 24.74it/s]

batch 990 loss: 0.24324941337108613


Train, Epoch 3 / 20:  64%|██████▍   | 1005/1563 [00:41<00:22, 24.74it/s]

batch 1000 loss: 0.34231643080711366


Train, Epoch 3 / 20:  65%|██████▍   | 1014/1563 [00:42<00:22, 24.78it/s]

batch 1010 loss: 0.3086524836719036


Train, Epoch 3 / 20:  65%|██████▌   | 1023/1563 [00:42<00:21, 24.79it/s]

batch 1020 loss: 0.3180746003985405


Train, Epoch 3 / 20:  66%|██████▌   | 1032/1563 [00:42<00:21, 24.62it/s]

batch 1030 loss: 0.26858790293335916


Train, Epoch 3 / 20:  67%|██████▋   | 1044/1563 [00:43<00:20, 24.83it/s]

batch 1040 loss: 0.35080331563949585


Train, Epoch 3 / 20:  67%|██████▋   | 1053/1563 [00:43<00:20, 24.81it/s]

batch 1050 loss: 0.3386603593826294


Train, Epoch 3 / 20:  68%|██████▊   | 1065/1563 [00:44<00:20, 24.75it/s]

batch 1060 loss: 0.30549328923225405


Train, Epoch 3 / 20:  69%|██████▊   | 1074/1563 [00:44<00:19, 24.71it/s]

batch 1070 loss: 0.27650130689144137


Train, Epoch 3 / 20:  69%|██████▉   | 1083/1563 [00:45<00:19, 24.74it/s]

batch 1080 loss: 0.3059900477528572


Train, Epoch 3 / 20:  70%|███████   | 1095/1563 [00:45<00:18, 24.76it/s]

batch 1090 loss: 0.28130612447857856


Train, Epoch 3 / 20:  71%|███████   | 1104/1563 [00:45<00:18, 24.80it/s]

batch 1100 loss: 0.33677469193935394


Train, Epoch 3 / 20:  71%|███████   | 1113/1563 [00:46<00:18, 24.70it/s]

batch 1110 loss: 0.31571459472179414


Train, Epoch 3 / 20:  72%|███████▏  | 1122/1563 [00:46<00:17, 24.85it/s]

batch 1120 loss: 0.3065657317638397


Train, Epoch 3 / 20:  73%|███████▎  | 1134/1563 [00:47<00:17, 24.78it/s]

batch 1130 loss: 0.3380445301532745


Train, Epoch 3 / 20:  73%|███████▎  | 1143/1563 [00:47<00:16, 24.78it/s]

batch 1140 loss: 0.36422847807407377


Train, Epoch 3 / 20:  74%|███████▎  | 1152/1563 [00:47<00:17, 23.22it/s]

batch 1150 loss: 0.2919066809117794


Train, Epoch 3 / 20:  74%|███████▍  | 1164/1563 [00:48<00:18, 22.11it/s]

batch 1160 loss: 0.31241179406642916


Train, Epoch 3 / 20:  75%|███████▌  | 1173/1563 [00:48<00:17, 22.55it/s]

batch 1170 loss: 0.30745984315872193


Train, Epoch 3 / 20:  76%|███████▌  | 1182/1563 [00:49<00:17, 22.13it/s]

batch 1180 loss: 0.3420543894171715


Train, Epoch 3 / 20:  76%|███████▋  | 1194/1563 [00:49<00:16, 22.13it/s]

batch 1190 loss: 0.2991744354367256


Train, Epoch 3 / 20:  77%|███████▋  | 1203/1563 [00:50<00:15, 23.63it/s]

batch 1200 loss: 0.31481511816382407


Train, Epoch 3 / 20:  78%|███████▊  | 1215/1563 [00:50<00:14, 24.53it/s]

batch 1210 loss: 0.29252017587423323


Train, Epoch 3 / 20:  78%|███████▊  | 1224/1563 [00:51<00:13, 24.25it/s]

batch 1220 loss: 0.32696756049990655


Train, Epoch 3 / 20:  79%|███████▉  | 1233/1563 [00:51<00:13, 24.50it/s]

batch 1230 loss: 0.37615028470754625


Train, Epoch 3 / 20:  80%|███████▉  | 1245/1563 [00:51<00:12, 24.72it/s]

batch 1240 loss: 0.40031903237104416


Train, Epoch 3 / 20:  80%|████████  | 1254/1563 [00:52<00:12, 24.71it/s]

batch 1250 loss: 0.2911130219697952


Train, Epoch 3 / 20:  81%|████████  | 1263/1563 [00:52<00:12, 24.72it/s]

batch 1260 loss: 0.2902273803949356


Train, Epoch 3 / 20:  82%|████████▏ | 1275/1563 [00:53<00:11, 24.73it/s]

batch 1270 loss: 0.28332229927182195


Train, Epoch 3 / 20:  82%|████████▏ | 1284/1563 [00:53<00:11, 24.72it/s]

batch 1280 loss: 0.2493879273533821


Train, Epoch 3 / 20:  83%|████████▎ | 1293/1563 [00:53<00:10, 24.55it/s]

batch 1290 loss: 0.30495256930589676


Train, Epoch 3 / 20:  83%|████████▎ | 1305/1563 [00:54<00:10, 24.73it/s]

batch 1300 loss: 0.2854018434882164


Train, Epoch 3 / 20:  84%|████████▍ | 1314/1563 [00:54<00:10, 24.73it/s]

batch 1310 loss: 0.3439402639865875


Train, Epoch 3 / 20:  85%|████████▍ | 1323/1563 [00:55<00:09, 24.53it/s]

batch 1320 loss: 0.30578130185604097


Train, Epoch 3 / 20:  85%|████████▌ | 1335/1563 [00:55<00:09, 24.79it/s]

batch 1330 loss: 0.32966781258583067


Train, Epoch 3 / 20:  86%|████████▌ | 1344/1563 [00:55<00:08, 24.73it/s]

batch 1340 loss: 0.2833933748304844


Train, Epoch 3 / 20:  87%|████████▋ | 1353/1563 [00:56<00:08, 24.44it/s]

batch 1350 loss: 0.3103933557868004


Train, Epoch 3 / 20:  87%|████████▋ | 1362/1563 [00:56<00:08, 24.63it/s]

batch 1360 loss: 0.330221613496542


Train, Epoch 3 / 20:  88%|████████▊ | 1374/1563 [00:57<00:07, 24.29it/s]

batch 1370 loss: 0.2755622535943985


Train, Epoch 3 / 20:  88%|████████▊ | 1383/1563 [00:57<00:07, 24.49it/s]

batch 1380 loss: 0.3566749632358551


Train, Epoch 3 / 20:  89%|████████▉ | 1395/1563 [00:57<00:06, 24.59it/s]

batch 1390 loss: 0.3331379473209381


Train, Epoch 3 / 20:  90%|████████▉ | 1404/1563 [00:58<00:06, 24.46it/s]

batch 1400 loss: 0.3743728488683701


Train, Epoch 3 / 20:  90%|█████████ | 1413/1563 [00:58<00:06, 24.57it/s]

batch 1410 loss: 0.36250387132167816


Train, Epoch 3 / 20:  91%|█████████ | 1422/1563 [00:59<00:05, 24.09it/s]

batch 1420 loss: 0.3599502667784691


Train, Epoch 3 / 20:  92%|█████████▏| 1434/1563 [00:59<00:05, 24.54it/s]

batch 1430 loss: 0.249730284512043


Train, Epoch 3 / 20:  92%|█████████▏| 1443/1563 [00:59<00:05, 22.95it/s]

batch 1440 loss: 0.4074462503194809


Train, Epoch 3 / 20:  93%|█████████▎| 1452/1563 [01:00<00:05, 21.97it/s]

batch 1450 loss: 0.38334065675735474


Train, Epoch 3 / 20:  94%|█████████▎| 1464/1563 [01:00<00:04, 22.00it/s]

batch 1460 loss: 0.2213415816426277


Train, Epoch 3 / 20:  94%|█████████▍| 1473/1563 [01:01<00:04, 21.52it/s]

batch 1470 loss: 0.4162814453244209


Train, Epoch 3 / 20:  95%|█████████▍| 1482/1563 [01:01<00:03, 21.31it/s]

batch 1480 loss: 0.29592072069644926


Train, Epoch 3 / 20:  96%|█████████▌| 1494/1563 [01:02<00:02, 23.02it/s]

batch 1490 loss: 0.29838897883892057


Train, Epoch 3 / 20:  96%|█████████▌| 1503/1563 [01:02<00:02, 24.32it/s]

batch 1500 loss: 0.2598245330154896


Train, Epoch 3 / 20:  97%|█████████▋| 1515/1563 [01:03<00:01, 24.75it/s]

batch 1510 loss: 0.426333736628294


Train, Epoch 3 / 20:  98%|█████████▊| 1524/1563 [01:03<00:01, 24.83it/s]

batch 1520 loss: 0.3587407790124416


Train, Epoch 3 / 20:  98%|█████████▊| 1533/1563 [01:03<00:01, 24.77it/s]

batch 1530 loss: 0.27980647310614587


Train, Epoch 3 / 20:  99%|█████████▊| 1542/1563 [01:04<00:00, 24.43it/s]

batch 1540 loss: 0.32955129742622374


Train, Epoch 3 / 20:  99%|█████████▉| 1554/1563 [01:04<00:00, 24.40it/s]

batch 1550 loss: 0.2811399534344673


Train, Epoch 3 / 20: 100%|██████████| 1563/1563 [01:05<00:00, 24.00it/s]


batch 1560 loss: 0.29199896156787875


Test, Epoch 3 / 20: 100%|██████████| 1563/1563 [00:29<00:00, 53.17it/s]


Epoch 3, loss: 0.46191361332118513, accuracy: 0.79816


Train, Epoch 4 / 20:   1%|          | 15/1563 [00:00<01:02, 24.67it/s]

batch 10 loss: 0.2747302740812302


Train, Epoch 4 / 20:   2%|▏         | 24/1563 [00:00<01:02, 24.81it/s]

batch 20 loss: 0.2779908299446106


Train, Epoch 4 / 20:   2%|▏         | 33/1563 [00:01<01:01, 24.85it/s]

batch 30 loss: 0.2624199718236923


Train, Epoch 4 / 20:   3%|▎         | 45/1563 [00:01<01:01, 24.69it/s]

batch 40 loss: 0.33460202664136884


Train, Epoch 4 / 20:   3%|▎         | 54/1563 [00:02<01:05, 23.13it/s]

batch 50 loss: 0.32365805059671404


Train, Epoch 4 / 20:   4%|▍         | 63/1563 [00:02<01:07, 22.07it/s]

batch 60 loss: 0.32756701931357385


Train, Epoch 4 / 20:   5%|▍         | 72/1563 [00:03<01:08, 21.86it/s]

batch 70 loss: 0.3285084642469883


Train, Epoch 4 / 20:   5%|▌         | 84/1563 [00:03<01:07, 22.00it/s]

batch 80 loss: 0.3419452950358391


Train, Epoch 4 / 20:   6%|▌         | 93/1563 [00:04<01:06, 22.07it/s]

batch 90 loss: 0.3374593771994114


Train, Epoch 4 / 20:   7%|▋         | 105/1563 [00:04<01:00, 24.10it/s]

batch 100 loss: 0.34413502663373946


Train, Epoch 4 / 20:   7%|▋         | 114/1563 [00:04<00:58, 24.68it/s]

batch 110 loss: 0.25079243406653406


Train, Epoch 4 / 20:   8%|▊         | 123/1563 [00:05<00:58, 24.41it/s]

batch 120 loss: 0.30738621652126313


Train, Epoch 4 / 20:   9%|▊         | 135/1563 [00:05<00:57, 24.63it/s]

batch 130 loss: 0.32562273293733596


Train, Epoch 4 / 20:   9%|▉         | 144/1563 [00:06<00:57, 24.73it/s]

batch 140 loss: 0.3116340309381485


Train, Epoch 4 / 20:  10%|▉         | 153/1563 [00:06<00:56, 24.82it/s]

batch 150 loss: 0.2818165600299835


Train, Epoch 4 / 20:  11%|█         | 165/1563 [00:06<00:56, 24.59it/s]

batch 160 loss: 0.3321663707494736


Train, Epoch 4 / 20:  11%|█         | 174/1563 [00:07<00:56, 24.76it/s]

batch 170 loss: 0.23775860518217087


Train, Epoch 4 / 20:  12%|█▏        | 183/1563 [00:07<00:55, 24.87it/s]

batch 180 loss: 0.2926121518015862


Train, Epoch 4 / 20:  12%|█▏        | 195/1563 [00:08<00:55, 24.77it/s]

batch 190 loss: 0.24547101631760598


Train, Epoch 4 / 20:  13%|█▎        | 204/1563 [00:08<00:54, 24.77it/s]

batch 200 loss: 0.3227141186594963


Train, Epoch 4 / 20:  14%|█▎        | 213/1563 [00:08<00:54, 24.64it/s]

batch 210 loss: 0.2588765323162079


Train, Epoch 4 / 20:  14%|█▍        | 225/1563 [00:09<00:54, 24.77it/s]

batch 220 loss: 0.3298365533351898


Train, Epoch 4 / 20:  15%|█▍        | 234/1563 [00:09<00:53, 24.78it/s]

batch 230 loss: 0.24250303581357002


Train, Epoch 4 / 20:  16%|█▌        | 243/1563 [00:10<00:53, 24.65it/s]

batch 240 loss: 0.3822904363274574


Train, Epoch 4 / 20:  16%|█▋        | 255/1563 [00:10<00:52, 24.75it/s]

batch 250 loss: 0.2903418309986591


Train, Epoch 4 / 20:  17%|█▋        | 264/1563 [00:10<00:53, 24.33it/s]

batch 260 loss: 0.3272665038704872


Train, Epoch 4 / 20:  17%|█▋        | 273/1563 [00:11<00:52, 24.65it/s]

batch 270 loss: 0.35643915086984634


Train, Epoch 4 / 20:  18%|█▊        | 285/1563 [00:11<00:51, 24.82it/s]

batch 280 loss: 0.3138289250433445


Train, Epoch 4 / 20:  19%|█▉        | 294/1563 [00:12<00:51, 24.71it/s]

batch 290 loss: 0.33647897392511367


Train, Epoch 4 / 20:  19%|█▉        | 303/1563 [00:12<00:50, 24.75it/s]

batch 300 loss: 0.34201347529888154


Train, Epoch 4 / 20:  20%|██        | 315/1563 [00:13<00:50, 24.75it/s]

batch 310 loss: 0.2610740095376968


Train, Epoch 4 / 20:  21%|██        | 324/1563 [00:13<00:50, 24.66it/s]

batch 320 loss: 0.3047835625708103


Train, Epoch 4 / 20:  21%|██▏       | 333/1563 [00:13<00:49, 24.73it/s]

batch 330 loss: 0.32668583691120145


Train, Epoch 4 / 20:  22%|██▏       | 342/1563 [00:14<00:51, 23.92it/s]

batch 340 loss: 0.29038451313972474


Train, Epoch 4 / 20:  23%|██▎       | 354/1563 [00:14<00:54, 22.20it/s]

batch 350 loss: 0.41496695429086683


Train, Epoch 4 / 20:  23%|██▎       | 363/1563 [00:15<00:58, 20.37it/s]

batch 360 loss: 0.38416344225406646


Train, Epoch 4 / 20:  24%|██▍       | 372/1563 [00:15<00:58, 20.25it/s]

batch 370 loss: 0.34562341719865797


Train, Epoch 4 / 20:  25%|██▍       | 384/1563 [00:16<00:56, 20.95it/s]

batch 380 loss: 0.3051338195800781


Train, Epoch 4 / 20:  25%|██▌       | 393/1563 [00:16<00:50, 23.32it/s]

batch 390 loss: 0.30310437083244324


Train, Epoch 4 / 20:  26%|██▌       | 402/1563 [00:16<00:47, 24.20it/s]

batch 400 loss: 0.3386926904320717


Train, Epoch 4 / 20:  26%|██▋       | 414/1563 [00:17<00:46, 24.67it/s]

batch 410 loss: 0.4680784344673157


Train, Epoch 4 / 20:  27%|██▋       | 423/1563 [00:17<00:46, 24.74it/s]

batch 420 loss: 0.2754755116999149


Train, Epoch 4 / 20:  28%|██▊       | 432/1563 [00:18<00:46, 24.46it/s]

batch 430 loss: 0.31506065279245377


Train, Epoch 4 / 20:  28%|██▊       | 444/1563 [00:18<00:45, 24.85it/s]

batch 440 loss: 0.34617775678634644


Train, Epoch 4 / 20:  29%|██▉       | 453/1563 [00:18<00:44, 24.89it/s]

batch 450 loss: 0.3782987967133522


Train, Epoch 4 / 20:  30%|██▉       | 465/1563 [00:19<00:44, 24.54it/s]

batch 460 loss: 0.3291569799184799


Train, Epoch 4 / 20:  30%|███       | 474/1563 [00:19<00:44, 24.71it/s]

batch 470 loss: 0.2645725980401039


Train, Epoch 4 / 20:  31%|███       | 483/1563 [00:20<00:44, 24.42it/s]

batch 480 loss: 0.2745181530714035


Train, Epoch 4 / 20:  32%|███▏      | 495/1563 [00:20<00:43, 24.77it/s]

batch 490 loss: 0.27285851165652275


Train, Epoch 4 / 20:  32%|███▏      | 504/1563 [00:21<00:42, 24.83it/s]

batch 500 loss: 0.3351699233055115


Train, Epoch 4 / 20:  33%|███▎      | 513/1563 [00:21<00:42, 24.72it/s]

batch 510 loss: 0.2622246026992798


Train, Epoch 4 / 20:  34%|███▎      | 525/1563 [00:21<00:41, 24.85it/s]

batch 520 loss: 0.3006480410695076


Train, Epoch 4 / 20:  34%|███▍      | 534/1563 [00:22<00:41, 24.76it/s]

batch 530 loss: 0.3550460129976273


Train, Epoch 4 / 20:  35%|███▍      | 543/1563 [00:22<00:41, 24.75it/s]

batch 540 loss: 0.3420400172472


Train, Epoch 4 / 20:  36%|███▌      | 555/1563 [00:23<00:40, 24.85it/s]

batch 550 loss: 0.29893572106957433


Train, Epoch 4 / 20:  36%|███▌      | 564/1563 [00:23<00:40, 24.71it/s]

batch 560 loss: 0.3367889426648617


Train, Epoch 4 / 20:  37%|███▋      | 573/1563 [00:23<00:40, 24.70it/s]

batch 570 loss: 0.29309302270412446


Train, Epoch 4 / 20:  37%|███▋      | 585/1563 [00:24<00:39, 24.71it/s]

batch 580 loss: 0.2803143925964832


Train, Epoch 4 / 20:  38%|███▊      | 594/1563 [00:24<00:39, 24.80it/s]

batch 590 loss: 0.2870097205042839


Train, Epoch 4 / 20:  39%|███▊      | 603/1563 [00:25<00:38, 24.90it/s]

batch 600 loss: 0.3732319548726082


Train, Epoch 4 / 20:  39%|███▉      | 612/1563 [00:25<00:38, 24.64it/s]

batch 610 loss: 0.3103925108909607


Train, Epoch 4 / 20:  40%|███▉      | 624/1563 [00:25<00:38, 24.46it/s]

batch 620 loss: 0.2483879178762436


Train, Epoch 4 / 20:  40%|████      | 633/1563 [00:26<00:39, 23.40it/s]

batch 630 loss: 0.28635999336838724


Train, Epoch 4 / 20:  41%|████      | 642/1563 [00:26<00:41, 22.11it/s]

batch 640 loss: 0.28458329737186433


Train, Epoch 4 / 20:  42%|████▏     | 654/1563 [00:27<00:40, 22.54it/s]

batch 650 loss: 0.31929885447025297


Train, Epoch 4 / 20:  42%|████▏     | 663/1563 [00:27<00:41, 21.78it/s]

batch 660 loss: 0.3012890234589577


Train, Epoch 4 / 20:  43%|████▎     | 672/1563 [00:28<00:42, 21.13it/s]

batch 670 loss: 0.2937513001263142


Train, Epoch 4 / 20:  44%|████▍     | 684/1563 [00:28<00:38, 23.09it/s]

batch 680 loss: 0.3776655226945877


Train, Epoch 4 / 20:  44%|████▍     | 693/1563 [00:28<00:35, 24.25it/s]

batch 690 loss: 0.29362594336271286


Train, Epoch 4 / 20:  45%|████▍     | 702/1563 [00:29<00:35, 24.54it/s]

batch 700 loss: 0.27989482656121256


Train, Epoch 4 / 20:  46%|████▌     | 714/1563 [00:29<00:34, 24.71it/s]

batch 710 loss: 0.30740840137004855


Train, Epoch 4 / 20:  46%|████▋     | 723/1563 [00:30<00:33, 24.76it/s]

batch 720 loss: 0.34127330780029297


Train, Epoch 4 / 20:  47%|████▋     | 735/1563 [00:30<00:33, 24.70it/s]

batch 730 loss: 0.3204489655792713


Train, Epoch 4 / 20:  48%|████▊     | 744/1563 [00:31<00:33, 24.47it/s]

batch 740 loss: 0.2735121764242649


Train, Epoch 4 / 20:  48%|████▊     | 753/1563 [00:31<00:32, 24.64it/s]

batch 750 loss: 0.3071112722158432


Train, Epoch 4 / 20:  49%|████▉     | 765/1563 [00:31<00:32, 24.71it/s]

batch 760 loss: 0.3612604171037674


Train, Epoch 4 / 20:  50%|████▉     | 774/1563 [00:32<00:31, 24.86it/s]

batch 770 loss: 0.35669803991913795


Train, Epoch 4 / 20:  50%|█████     | 783/1563 [00:32<00:31, 24.69it/s]

batch 780 loss: 0.3008938729763031


Train, Epoch 4 / 20:  51%|█████     | 795/1563 [00:33<00:31, 24.68it/s]

batch 790 loss: 0.31526989787817


Train, Epoch 4 / 20:  51%|█████▏    | 804/1563 [00:33<00:31, 24.34it/s]

batch 800 loss: 0.25342140197753904


Train, Epoch 4 / 20:  52%|█████▏    | 813/1563 [00:33<00:30, 24.81it/s]

batch 810 loss: 0.34390575289726255


Train, Epoch 4 / 20:  53%|█████▎    | 825/1563 [00:34<00:29, 24.86it/s]

batch 820 loss: 0.28720314614474773


Train, Epoch 4 / 20:  53%|█████▎    | 834/1563 [00:34<00:29, 24.52it/s]

batch 830 loss: 0.30220688357949255


Train, Epoch 4 / 20:  54%|█████▍    | 843/1563 [00:35<00:29, 24.69it/s]

batch 840 loss: 0.29521204754710195


Train, Epoch 4 / 20:  55%|█████▍    | 852/1563 [00:35<00:28, 24.79it/s]

batch 850 loss: 0.27330139726400376


Train, Epoch 4 / 20:  55%|█████▌    | 864/1563 [00:35<00:28, 24.76it/s]

batch 860 loss: 0.2972702421247959


Train, Epoch 4 / 20:  56%|█████▌    | 873/1563 [00:36<00:27, 24.90it/s]

batch 870 loss: 0.3309099555015564


Train, Epoch 4 / 20:  57%|█████▋    | 885/1563 [00:36<00:27, 24.80it/s]

batch 880 loss: 0.24806026443839074


Train, Epoch 4 / 20:  57%|█████▋    | 894/1563 [00:37<00:27, 24.57it/s]

batch 890 loss: 0.35933089107275007


Train, Epoch 4 / 20:  58%|█████▊    | 903/1563 [00:37<00:26, 24.64it/s]

batch 900 loss: 0.34238729774951937


Train, Epoch 4 / 20:  59%|█████▊    | 915/1563 [00:37<00:26, 24.82it/s]

batch 910 loss: 0.26637869998812674


Train, Epoch 4 / 20:  59%|█████▉    | 924/1563 [00:38<00:25, 24.71it/s]

batch 920 loss: 0.30802459865808485


Train, Epoch 4 / 20:  60%|█████▉    | 933/1563 [00:38<00:27, 22.71it/s]

batch 930 loss: 0.38431568294763563


Train, Epoch 4 / 20:  60%|██████    | 942/1563 [00:39<00:28, 21.98it/s]

batch 940 loss: 0.2597604110836983


Train, Epoch 4 / 20:  61%|██████    | 954/1563 [00:39<00:27, 21.80it/s]

batch 950 loss: 0.3131450653076172


Train, Epoch 4 / 20:  62%|██████▏   | 963/1563 [00:40<00:27, 21.75it/s]

batch 960 loss: 0.31020401418209076


Train, Epoch 4 / 20:  62%|██████▏   | 972/1563 [00:40<00:26, 22.25it/s]

batch 970 loss: 0.30740406811237336


Train, Epoch 4 / 20:  63%|██████▎   | 984/1563 [00:41<00:23, 24.26it/s]

batch 980 loss: 0.26374110728502276


Train, Epoch 4 / 20:  64%|██████▎   | 993/1563 [00:41<00:23, 24.58it/s]

batch 990 loss: 0.30994800925254823


Train, Epoch 4 / 20:  64%|██████▍   | 1005/1563 [00:41<00:22, 24.55it/s]

batch 1000 loss: 0.32430636063218116


Train, Epoch 4 / 20:  65%|██████▍   | 1014/1563 [00:42<00:22, 24.73it/s]

batch 1010 loss: 0.28453618437051775


Train, Epoch 4 / 20:  65%|██████▌   | 1023/1563 [00:42<00:21, 24.82it/s]

batch 1020 loss: 0.36320057362318037


Train, Epoch 4 / 20:  66%|██████▌   | 1035/1563 [00:43<00:21, 24.66it/s]

batch 1030 loss: 0.2699805095791817


Train, Epoch 4 / 20:  67%|██████▋   | 1044/1563 [00:43<00:21, 24.60it/s]

batch 1040 loss: 0.2574846476316452


Train, Epoch 4 / 20:  67%|██████▋   | 1053/1563 [00:43<00:20, 24.63it/s]

batch 1050 loss: 0.33039209693670274


Train, Epoch 4 / 20:  68%|██████▊   | 1065/1563 [00:44<00:20, 24.83it/s]

batch 1060 loss: 0.3254571445286274


Train, Epoch 4 / 20:  69%|██████▊   | 1074/1563 [00:44<00:19, 24.84it/s]

batch 1070 loss: 0.29286663979291916


Train, Epoch 4 / 20:  69%|██████▉   | 1083/1563 [00:45<00:19, 24.80it/s]

batch 1080 loss: 0.2576608471572399


Train, Epoch 4 / 20:  70%|███████   | 1095/1563 [00:45<00:18, 24.76it/s]

batch 1090 loss: 0.2559959478676319


Train, Epoch 4 / 20:  71%|███████   | 1104/1563 [00:45<00:18, 24.54it/s]

batch 1100 loss: 0.3128915518522263


Train, Epoch 4 / 20:  71%|███████   | 1113/1563 [00:46<00:18, 24.77it/s]

batch 1110 loss: 0.3518818110227585


Train, Epoch 4 / 20:  72%|███████▏  | 1125/1563 [00:46<00:17, 24.90it/s]

batch 1120 loss: 0.3004509650170803


Train, Epoch 4 / 20:  73%|███████▎  | 1134/1563 [00:47<00:17, 24.69it/s]

batch 1130 loss: 0.4004813492298126


Train, Epoch 4 / 20:  73%|███████▎  | 1143/1563 [00:47<00:16, 24.82it/s]

batch 1140 loss: 0.3314893513917923


Train, Epoch 4 / 20:  74%|███████▎  | 1152/1563 [00:47<00:16, 24.45it/s]

batch 1150 loss: 0.26314360499382017


Train, Epoch 4 / 20:  74%|███████▍  | 1164/1563 [00:48<00:16, 24.78it/s]

batch 1160 loss: 0.30591697692871095


Train, Epoch 4 / 20:  75%|███████▌  | 1173/1563 [00:48<00:15, 24.66it/s]

batch 1170 loss: 0.2507130205631256


Train, Epoch 4 / 20:  76%|███████▌  | 1185/1563 [00:49<00:15, 24.63it/s]

batch 1180 loss: 0.3456022828817368


Train, Epoch 4 / 20:  76%|███████▋  | 1194/1563 [00:49<00:14, 24.67it/s]

batch 1190 loss: 0.37395156025886533


Train, Epoch 4 / 20:  77%|███████▋  | 1203/1563 [00:49<00:14, 24.67it/s]

batch 1200 loss: 0.2984802410006523


Train, Epoch 4 / 20:  78%|███████▊  | 1215/1563 [00:50<00:13, 24.88it/s]

batch 1210 loss: 0.3198609724640846


Train, Epoch 4 / 20:  78%|███████▊  | 1224/1563 [00:50<00:14, 23.41it/s]

batch 1220 loss: 0.31764824986457824


Train, Epoch 4 / 20:  79%|███████▉  | 1233/1563 [00:51<00:14, 22.71it/s]

batch 1230 loss: 0.3531113341450691


Train, Epoch 4 / 20:  79%|███████▉  | 1242/1563 [00:51<00:14, 21.87it/s]

batch 1240 loss: 0.33174494951963424


Train, Epoch 4 / 20:  80%|████████  | 1254/1563 [00:52<00:14, 21.82it/s]

batch 1250 loss: 0.29477608278393747


Train, Epoch 4 / 20:  81%|████████  | 1263/1563 [00:52<00:14, 21.18it/s]

batch 1260 loss: 0.34918485283851625


Train, Epoch 4 / 20:  81%|████████▏ | 1272/1563 [00:52<00:12, 23.12it/s]

batch 1270 loss: 0.27826168984174726


Train, Epoch 4 / 20:  82%|████████▏ | 1284/1563 [00:53<00:11, 24.39it/s]

batch 1280 loss: 0.29885939359664915


Train, Epoch 4 / 20:  83%|████████▎ | 1293/1563 [00:53<00:10, 24.55it/s]

batch 1290 loss: 0.3179220102727413


Train, Epoch 4 / 20:  83%|████████▎ | 1305/1563 [00:54<00:10, 24.72it/s]

batch 1300 loss: 0.36748779863119124


Train, Epoch 4 / 20:  84%|████████▍ | 1314/1563 [00:54<00:10, 24.85it/s]

batch 1310 loss: 0.30974568128585817


Train, Epoch 4 / 20:  85%|████████▍ | 1323/1563 [00:55<00:09, 24.64it/s]

batch 1320 loss: 0.3216960996389389


Train, Epoch 4 / 20:  85%|████████▌ | 1335/1563 [00:55<00:09, 24.74it/s]

batch 1330 loss: 0.3242617458105087


Train, Epoch 4 / 20:  86%|████████▌ | 1344/1563 [00:55<00:08, 24.84it/s]

batch 1340 loss: 0.346371679008007


Train, Epoch 4 / 20:  87%|████████▋ | 1353/1563 [00:56<00:08, 24.71it/s]

batch 1350 loss: 0.20723241940140724


Train, Epoch 4 / 20:  87%|████████▋ | 1365/1563 [00:56<00:08, 24.71it/s]

batch 1360 loss: 0.28887921050190923


Train, Epoch 4 / 20:  88%|████████▊ | 1374/1563 [00:57<00:07, 24.82it/s]

batch 1370 loss: 0.28449536934494973


Train, Epoch 4 / 20:  88%|████████▊ | 1383/1563 [00:57<00:07, 24.75it/s]

batch 1380 loss: 0.31253954768180847


Train, Epoch 4 / 20:  89%|████████▉ | 1392/1563 [00:57<00:06, 24.54it/s]

batch 1390 loss: 0.3229449734091759


Train, Epoch 4 / 20:  90%|████████▉ | 1404/1563 [00:58<00:06, 24.30it/s]

batch 1400 loss: 0.37703456580638883


Train, Epoch 4 / 20:  90%|█████████ | 1413/1563 [00:58<00:06, 24.45it/s]

batch 1410 loss: 0.4055956467986107


Train, Epoch 4 / 20:  91%|█████████ | 1422/1563 [00:59<00:05, 24.33it/s]

batch 1420 loss: 0.2734722658991814


Train, Epoch 4 / 20:  92%|█████████▏| 1434/1563 [00:59<00:05, 24.53it/s]

batch 1430 loss: 0.30372281596064565


Train, Epoch 4 / 20:  92%|█████████▏| 1443/1563 [00:59<00:04, 24.83it/s]

batch 1440 loss: 0.3085404023528099


Train, Epoch 4 / 20:  93%|█████████▎| 1455/1563 [01:00<00:04, 24.69it/s]

batch 1450 loss: 0.2728892207145691


Train, Epoch 4 / 20:  94%|█████████▎| 1464/1563 [01:00<00:04, 24.68it/s]

batch 1460 loss: 0.2582352988421917


Train, Epoch 4 / 20:  94%|█████████▍| 1473/1563 [01:01<00:03, 24.60it/s]

batch 1470 loss: 0.36131943613290785


Train, Epoch 4 / 20:  95%|█████████▌| 1485/1563 [01:01<00:03, 24.77it/s]

batch 1480 loss: 0.2759132131934166


Train, Epoch 4 / 20:  96%|█████████▌| 1494/1563 [01:01<00:02, 24.98it/s]

batch 1490 loss: 0.3726701259613037


Train, Epoch 4 / 20:  96%|█████████▌| 1503/1563 [01:02<00:02, 24.79it/s]

batch 1500 loss: 0.38334828019142153


Train, Epoch 4 / 20:  97%|█████████▋| 1512/1563 [01:02<00:02, 24.78it/s]

batch 1510 loss: 0.3308547087013721


Train, Epoch 4 / 20:  98%|█████████▊| 1524/1563 [01:03<00:01, 22.55it/s]

batch 1520 loss: 0.3479881212115288


Train, Epoch 4 / 20:  98%|█████████▊| 1533/1563 [01:03<00:01, 22.60it/s]

batch 1530 loss: 0.31366133093833926


Train, Epoch 4 / 20:  99%|█████████▊| 1542/1563 [01:04<00:00, 22.94it/s]

batch 1540 loss: 0.2692111149430275


Train, Epoch 4 / 20:  99%|█████████▉| 1554/1563 [01:04<00:00, 22.25it/s]

batch 1550 loss: 0.3660686001181602


Train, Epoch 4 / 20: 100%|██████████| 1563/1563 [01:04<00:00, 24.05it/s]


batch 1560 loss: 0.2886709086596966


Test, Epoch 4 / 20: 100%|██████████| 1563/1563 [00:29<00:00, 53.17it/s]


Epoch 4, loss: 0.44013975133240224, accuracy: 0.80632


Train, Epoch 5 / 20:   1%|          | 15/1563 [00:00<01:02, 24.86it/s]

batch 10 loss: 0.34234242886304855


Train, Epoch 5 / 20:   2%|▏         | 24/1563 [00:00<01:02, 24.69it/s]

batch 20 loss: 0.2300398215651512


Train, Epoch 5 / 20:   2%|▏         | 33/1563 [00:01<01:02, 24.36it/s]

batch 30 loss: 0.3107479438185692


Train, Epoch 5 / 20:   3%|▎         | 42/1563 [00:01<01:03, 24.04it/s]

batch 40 loss: 0.2522423006594181


Train, Epoch 5 / 20:   3%|▎         | 54/1563 [00:02<01:01, 24.70it/s]

batch 50 loss: 0.25652572736144064


Train, Epoch 5 / 20:   4%|▍         | 63/1563 [00:02<01:00, 24.67it/s]

batch 60 loss: 0.2867560774087906


Train, Epoch 5 / 20:   5%|▍         | 75/1563 [00:03<01:00, 24.61it/s]

batch 70 loss: 0.3540546908974648


Train, Epoch 5 / 20:   5%|▌         | 84/1563 [00:03<00:59, 24.72it/s]

batch 80 loss: 0.2831575982272625


Train, Epoch 5 / 20:   6%|▌         | 93/1563 [00:03<00:59, 24.84it/s]

batch 90 loss: 0.2981823481619358


Train, Epoch 5 / 20:   7%|▋         | 105/1563 [00:04<00:59, 24.67it/s]

batch 100 loss: 0.34576377272605896


Train, Epoch 5 / 20:   7%|▋         | 114/1563 [00:04<00:58, 24.59it/s]

batch 110 loss: 0.29860562831163406


Train, Epoch 5 / 20:   8%|▊         | 123/1563 [00:05<00:58, 24.54it/s]

batch 120 loss: 0.3018780767917633


Train, Epoch 5 / 20:   8%|▊         | 132/1563 [00:05<01:03, 22.55it/s]

batch 130 loss: 0.3444321006536484


Train, Epoch 5 / 20:   9%|▉         | 144/1563 [00:05<01:04, 21.96it/s]

batch 140 loss: 0.3414796903729439


Train, Epoch 5 / 20:  10%|▉         | 153/1563 [00:06<01:02, 22.44it/s]

batch 150 loss: 0.2736751653254032


Train, Epoch 5 / 20:  10%|█         | 162/1563 [00:06<01:04, 21.80it/s]

batch 160 loss: 0.31468254774808885


Train, Epoch 5 / 20:  11%|█         | 174/1563 [00:07<01:01, 22.51it/s]

batch 170 loss: 0.3474077984690666


Train, Epoch 5 / 20:  12%|█▏        | 183/1563 [00:07<00:57, 24.00it/s]

batch 180 loss: 0.38625444620847704


Train, Epoch 5 / 20:  12%|█▏        | 195/1563 [00:08<00:55, 24.49it/s]

batch 190 loss: 0.4111117526888847


Train, Epoch 5 / 20:  13%|█▎        | 204/1563 [00:08<00:54, 24.86it/s]

batch 200 loss: 0.22298244535923004


Train, Epoch 5 / 20:  14%|█▎        | 213/1563 [00:08<00:54, 24.81it/s]

batch 210 loss: 0.3658934056758881


Train, Epoch 5 / 20:  14%|█▍        | 225/1563 [00:09<00:54, 24.69it/s]

batch 220 loss: 0.37160066366195676


Train, Epoch 5 / 20:  15%|█▍        | 234/1563 [00:09<00:53, 24.91it/s]

batch 230 loss: 0.29977166205644606


Train, Epoch 5 / 20:  16%|█▌        | 243/1563 [00:10<00:54, 24.42it/s]

batch 240 loss: 0.3310542434453964


Train, Epoch 5 / 20:  16%|█▋        | 255/1563 [00:10<00:52, 24.69it/s]

batch 250 loss: 0.329913467168808


Train, Epoch 5 / 20:  17%|█▋        | 264/1563 [00:11<00:52, 24.53it/s]

batch 260 loss: 0.2751699276268482


Train, Epoch 5 / 20:  17%|█▋        | 273/1563 [00:11<00:52, 24.49it/s]

batch 270 loss: 0.3513511151075363


Train, Epoch 5 / 20:  18%|█▊        | 282/1563 [00:11<00:52, 24.49it/s]

batch 280 loss: 0.3224359735846519


Train, Epoch 5 / 20:  19%|█▉        | 294/1563 [00:12<00:51, 24.50it/s]

batch 290 loss: 0.34882123917341235


Train, Epoch 5 / 20:  19%|█▉        | 303/1563 [00:12<00:51, 24.60it/s]

batch 300 loss: 0.23991989269852637


Train, Epoch 5 / 20:  20%|██        | 315/1563 [00:13<00:50, 24.79it/s]

batch 310 loss: 0.2345087707042694


Train, Epoch 5 / 20:  21%|██        | 324/1563 [00:13<00:49, 24.81it/s]

batch 320 loss: 0.2795940928161144


Train, Epoch 5 / 20:  21%|██▏       | 333/1563 [00:13<00:49, 24.79it/s]

batch 330 loss: 0.3108551323413849


Train, Epoch 5 / 20:  22%|██▏       | 345/1563 [00:14<00:49, 24.61it/s]

batch 340 loss: 0.29652981609106066


Train, Epoch 5 / 20:  23%|██▎       | 354/1563 [00:14<00:49, 24.63it/s]

batch 350 loss: 0.3312553584575653


Train, Epoch 5 / 20:  23%|██▎       | 363/1563 [00:15<00:48, 24.59it/s]

batch 360 loss: 0.33916857540607454


Train, Epoch 5 / 20:  24%|██▍       | 375/1563 [00:15<00:47, 24.87it/s]

batch 370 loss: 0.3003019727766514


Train, Epoch 5 / 20:  25%|██▍       | 384/1563 [00:15<00:47, 24.92it/s]

batch 380 loss: 0.3556005984544754


Train, Epoch 5 / 20:  25%|██▌       | 393/1563 [00:16<00:47, 24.63it/s]

batch 390 loss: 0.29078152775764465


Train, Epoch 5 / 20:  26%|██▌       | 405/1563 [00:16<00:46, 24.74it/s]

batch 400 loss: 0.29333810061216353


Train, Epoch 5 / 20:  26%|██▋       | 414/1563 [00:17<00:46, 24.68it/s]

batch 410 loss: 0.3437120884656906


Train, Epoch 5 / 20:  27%|██▋       | 423/1563 [00:17<00:49, 23.16it/s]

batch 420 loss: 0.253972315043211


Train, Epoch 5 / 20:  28%|██▊       | 432/1563 [00:17<00:50, 22.33it/s]

batch 430 loss: 0.2942077063024044


Train, Epoch 5 / 20:  28%|██▊       | 444/1563 [00:18<00:51, 21.53it/s]

batch 440 loss: 0.29308581352233887


Train, Epoch 5 / 20:  29%|██▉       | 453/1563 [00:18<00:52, 21.28it/s]

batch 450 loss: 0.3205569013953209


Train, Epoch 5 / 20:  30%|██▉       | 462/1563 [00:19<00:51, 21.19it/s]

batch 460 loss: 0.3326145336031914


Train, Epoch 5 / 20:  30%|███       | 474/1563 [00:19<00:45, 23.86it/s]

batch 470 loss: 0.2717342011630535


Train, Epoch 5 / 20:  31%|███       | 483/1563 [00:20<00:44, 24.49it/s]

batch 480 loss: 0.2751464664936066


Train, Epoch 5 / 20:  32%|███▏      | 495/1563 [00:20<00:43, 24.71it/s]

batch 490 loss: 0.3347086012363434


Train, Epoch 5 / 20:  32%|███▏      | 504/1563 [00:21<00:42, 24.81it/s]

batch 500 loss: 0.25464864894747735


Train, Epoch 5 / 20:  33%|███▎      | 513/1563 [00:21<00:43, 24.42it/s]

batch 510 loss: 0.24984567388892173


Train, Epoch 5 / 20:  33%|███▎      | 522/1563 [00:21<00:42, 24.64it/s]

batch 520 loss: 0.3355471059679985


Train, Epoch 5 / 20:  34%|███▍      | 534/1563 [00:22<00:41, 24.69it/s]

batch 530 loss: 0.3029136322438717


Train, Epoch 5 / 20:  35%|███▍      | 543/1563 [00:22<00:41, 24.83it/s]

batch 540 loss: 0.24532071948051454


Train, Epoch 5 / 20:  36%|███▌      | 555/1563 [00:23<00:40, 24.87it/s]

batch 550 loss: 0.2697257995605469


Train, Epoch 5 / 20:  36%|███▌      | 564/1563 [00:23<00:40, 24.77it/s]

batch 560 loss: 0.282724891602993


Train, Epoch 5 / 20:  37%|███▋      | 573/1563 [00:23<00:39, 24.80it/s]

batch 570 loss: 0.29605382680892944


Train, Epoch 5 / 20:  37%|███▋      | 585/1563 [00:24<00:40, 24.41it/s]

batch 580 loss: 0.23891208469867706


Train, Epoch 5 / 20:  38%|███▊      | 594/1563 [00:24<00:39, 24.68it/s]

batch 590 loss: 0.3695391476154327


Train, Epoch 5 / 20:  39%|███▊      | 603/1563 [00:25<00:38, 24.69it/s]

batch 600 loss: 0.27428620904684065


Train, Epoch 5 / 20:  39%|███▉      | 615/1563 [00:25<00:38, 24.81it/s]

batch 610 loss: 0.3041585460305214


Train, Epoch 5 / 20:  40%|███▉      | 624/1563 [00:25<00:37, 24.95it/s]

batch 620 loss: 0.29456393122673036


Train, Epoch 5 / 20:  40%|████      | 633/1563 [00:26<00:37, 24.80it/s]

batch 630 loss: 0.3143278807401657


Train, Epoch 5 / 20:  41%|████▏     | 645/1563 [00:26<00:37, 24.54it/s]

batch 640 loss: 0.3173105478286743


Train, Epoch 5 / 20:  42%|████▏     | 654/1563 [00:27<00:36, 24.73it/s]

batch 650 loss: 0.2667364113032818


Train, Epoch 5 / 20:  42%|████▏     | 663/1563 [00:27<00:36, 24.77it/s]

batch 660 loss: 0.30421170592308044


Train, Epoch 5 / 20:  43%|████▎     | 675/1563 [00:27<00:35, 24.91it/s]

batch 670 loss: 0.26751993373036387


Train, Epoch 5 / 20:  44%|████▍     | 684/1563 [00:28<00:35, 24.84it/s]

batch 680 loss: 0.26362729221582415


Train, Epoch 5 / 20:  44%|████▍     | 693/1563 [00:28<00:35, 24.70it/s]

batch 690 loss: 0.2635065712034702


Train, Epoch 5 / 20:  45%|████▌     | 705/1563 [00:29<00:34, 24.83it/s]

batch 700 loss: 0.26357533037662506


Train, Epoch 5 / 20:  46%|████▌     | 714/1563 [00:29<00:36, 22.96it/s]

batch 710 loss: 0.2418940968811512


Train, Epoch 5 / 20:  46%|████▋     | 723/1563 [00:29<00:36, 22.82it/s]

batch 720 loss: 0.2960028171539307


Train, Epoch 5 / 20:  47%|████▋     | 732/1563 [00:30<00:36, 22.55it/s]

batch 730 loss: 0.2735727697610855


Train, Epoch 5 / 20:  48%|████▊     | 744/1563 [00:30<00:36, 22.66it/s]

batch 740 loss: 0.32178683280944825


Train, Epoch 5 / 20:  48%|████▊     | 753/1563 [00:31<00:35, 22.52it/s]

batch 750 loss: 0.2714178889989853


Train, Epoch 5 / 20:  49%|████▉     | 765/1563 [00:31<00:34, 23.19it/s]

batch 760 loss: 0.3627549260854721


Train, Epoch 5 / 20:  50%|████▉     | 774/1563 [00:32<00:32, 24.50it/s]

batch 770 loss: 0.25062874183058736


Train, Epoch 5 / 20:  50%|█████     | 783/1563 [00:32<00:31, 24.66it/s]

batch 780 loss: 0.29661973640322686


Train, Epoch 5 / 20:  51%|█████     | 795/1563 [00:33<00:31, 24.74it/s]

batch 790 loss: 0.3441406324505806


Train, Epoch 5 / 20:  51%|█████▏    | 804/1563 [00:33<00:30, 24.74it/s]

batch 800 loss: 0.32256433069705964


Train, Epoch 5 / 20:  52%|█████▏    | 813/1563 [00:33<00:30, 24.61it/s]

batch 810 loss: 0.30513132363557816


Train, Epoch 5 / 20:  53%|█████▎    | 825/1563 [00:34<00:29, 24.91it/s]

batch 820 loss: 0.3177977129817009


Train, Epoch 5 / 20:  53%|█████▎    | 834/1563 [00:34<00:29, 24.91it/s]

batch 830 loss: 0.34347852170467374


Train, Epoch 5 / 20:  54%|█████▍    | 843/1563 [00:34<00:29, 24.71it/s]

batch 840 loss: 0.26779027879238126


Train, Epoch 5 / 20:  55%|█████▍    | 855/1563 [00:35<00:28, 24.81it/s]

batch 850 loss: 0.24447937086224555


Train, Epoch 5 / 20:  55%|█████▌    | 864/1563 [00:35<00:28, 24.29it/s]

batch 860 loss: 0.2596317507326603


Train, Epoch 5 / 20:  56%|█████▌    | 873/1563 [00:36<00:27, 24.65it/s]

batch 870 loss: 0.3194355398416519


Train, Epoch 5 / 20:  57%|█████▋    | 885/1563 [00:36<00:27, 24.87it/s]

batch 880 loss: 0.28711396753787993


Train, Epoch 5 / 20:  57%|█████▋    | 894/1563 [00:37<00:27, 24.73it/s]

batch 890 loss: 0.34131294712424276


Train, Epoch 5 / 20:  58%|█████▊    | 903/1563 [00:37<00:26, 24.82it/s]

batch 900 loss: 0.2609104707837105


Train, Epoch 5 / 20:  59%|█████▊    | 915/1563 [00:37<00:26, 24.64it/s]

batch 910 loss: 0.33511845767498016


Train, Epoch 5 / 20:  59%|█████▉    | 924/1563 [00:38<00:25, 24.77it/s]

batch 920 loss: 0.31201308220624924


Train, Epoch 5 / 20:  60%|█████▉    | 933/1563 [00:38<00:25, 24.57it/s]

batch 930 loss: 0.35993416160345076


Train, Epoch 5 / 20:  60%|██████    | 945/1563 [00:39<00:25, 24.56it/s]

batch 940 loss: 0.2984450817108154


Train, Epoch 5 / 20:  61%|██████    | 954/1563 [00:39<00:24, 24.76it/s]

batch 950 loss: 0.2995739236474037


Train, Epoch 5 / 20:  62%|██████▏   | 963/1563 [00:39<00:24, 24.75it/s]

batch 960 loss: 0.28857418671250346


Train, Epoch 5 / 20:  62%|██████▏   | 975/1563 [00:40<00:23, 24.72it/s]

batch 970 loss: 0.38219199925661085


Train, Epoch 5 / 20:  63%|██████▎   | 984/1563 [00:40<00:23, 24.75it/s]

batch 980 loss: 0.3044772297143936


Train, Epoch 5 / 20:  64%|██████▎   | 993/1563 [00:41<00:23, 24.75it/s]

batch 990 loss: 0.32457551509141924


Train, Epoch 5 / 20:  64%|██████▍   | 1005/1563 [00:41<00:22, 24.73it/s]

batch 1000 loss: 0.22720724642276763


Train, Epoch 5 / 20:  65%|██████▍   | 1014/1563 [00:41<00:24, 22.62it/s]

batch 1010 loss: 0.2915207877755165


Train, Epoch 5 / 20:  65%|██████▌   | 1023/1563 [00:42<00:24, 22.44it/s]

batch 1020 loss: 0.3116681382060051


Train, Epoch 5 / 20:  66%|██████▌   | 1032/1563 [00:42<00:23, 22.84it/s]

batch 1030 loss: 0.2977436505258083


Train, Epoch 5 / 20:  67%|██████▋   | 1044/1563 [00:43<00:22, 22.79it/s]

batch 1040 loss: 0.2703931197524071


Train, Epoch 5 / 20:  67%|██████▋   | 1053/1563 [00:43<00:23, 21.70it/s]

batch 1050 loss: 0.31197095215320586


Train, Epoch 5 / 20:  68%|██████▊   | 1065/1563 [00:44<00:21, 23.21it/s]

batch 1060 loss: 0.2847636789083481


Train, Epoch 5 / 20:  69%|██████▊   | 1074/1563 [00:44<00:20, 24.22it/s]

batch 1070 loss: 0.31636603474617003


Train, Epoch 5 / 20:  69%|██████▉   | 1083/1563 [00:44<00:19, 24.39it/s]

batch 1080 loss: 0.3092207536101341


Train, Epoch 5 / 20:  70%|███████   | 1095/1563 [00:45<00:19, 24.38it/s]

batch 1090 loss: 0.34147847816348076


Train, Epoch 5 / 20:  71%|███████   | 1104/1563 [00:45<00:18, 24.54it/s]

batch 1100 loss: 0.3183392763137817


Train, Epoch 5 / 20:  71%|███████   | 1113/1563 [00:46<00:18, 24.73it/s]

batch 1110 loss: 0.2942590519785881


Train, Epoch 5 / 20:  72%|███████▏  | 1125/1563 [00:46<00:17, 24.69it/s]

batch 1120 loss: 0.28388780951499937


Train, Epoch 5 / 20:  73%|███████▎  | 1134/1563 [00:47<00:17, 24.88it/s]

batch 1130 loss: 0.3464208319783211


Train, Epoch 5 / 20:  73%|███████▎  | 1143/1563 [00:47<00:17, 24.62it/s]

batch 1140 loss: 0.26367164254188535


Train, Epoch 5 / 20:  74%|███████▍  | 1155/1563 [00:47<00:16, 24.59it/s]

batch 1150 loss: 0.3032907426357269


Train, Epoch 5 / 20:  74%|███████▍  | 1164/1563 [00:48<00:16, 24.59it/s]

batch 1160 loss: 0.29914764910936353


Train, Epoch 5 / 20:  75%|███████▌  | 1173/1563 [00:48<00:15, 24.81it/s]

batch 1170 loss: 0.289614300429821


Train, Epoch 5 / 20:  76%|███████▌  | 1185/1563 [00:49<00:15, 24.71it/s]

batch 1180 loss: 0.32345872819423677


Train, Epoch 5 / 20:  76%|███████▋  | 1194/1563 [00:49<00:14, 24.61it/s]

batch 1190 loss: 0.39226792603731153


Train, Epoch 5 / 20:  77%|███████▋  | 1203/1563 [00:49<00:14, 24.72it/s]

batch 1200 loss: 0.34282336533069613


Train, Epoch 5 / 20:  78%|███████▊  | 1215/1563 [00:50<00:14, 24.84it/s]

batch 1210 loss: 0.30709597170352937


Train, Epoch 5 / 20:  78%|███████▊  | 1224/1563 [00:50<00:13, 24.92it/s]

batch 1220 loss: 0.32881111800670626


Train, Epoch 5 / 20:  79%|███████▉  | 1233/1563 [00:51<00:13, 24.56it/s]

batch 1230 loss: 0.32442514300346376


Train, Epoch 5 / 20:  80%|███████▉  | 1245/1563 [00:51<00:12, 24.69it/s]

batch 1240 loss: 0.24735936224460603


Train, Epoch 5 / 20:  80%|████████  | 1254/1563 [00:51<00:12, 24.59it/s]

batch 1250 loss: 0.3295058473944664


Train, Epoch 5 / 20:  81%|████████  | 1263/1563 [00:52<00:12, 24.75it/s]

batch 1260 loss: 0.30689396485686304


Train, Epoch 5 / 20:  82%|████████▏ | 1275/1563 [00:52<00:11, 24.79it/s]

batch 1270 loss: 0.35998585671186445


Train, Epoch 5 / 20:  82%|████████▏ | 1284/1563 [00:53<00:11, 24.82it/s]

batch 1280 loss: 0.33128500580787656


Train, Epoch 5 / 20:  83%|████████▎ | 1293/1563 [00:53<00:10, 24.64it/s]

batch 1290 loss: 0.3814608037471771


Train, Epoch 5 / 20:  83%|████████▎ | 1302/1563 [00:53<00:10, 24.66it/s]

batch 1300 loss: 0.30357891172170637


Train, Epoch 5 / 20:  84%|████████▍ | 1314/1563 [00:54<00:11, 22.17it/s]

batch 1310 loss: 0.3045256555080414


Train, Epoch 5 / 20:  85%|████████▍ | 1323/1563 [00:54<00:11, 21.80it/s]

batch 1320 loss: 0.328337536752224


Train, Epoch 5 / 20:  85%|████████▌ | 1332/1563 [00:55<00:10, 22.12it/s]

batch 1330 loss: 0.3257066398859024


Train, Epoch 5 / 20:  86%|████████▌ | 1344/1563 [00:55<00:10, 21.49it/s]

batch 1340 loss: 0.3308512017130852


Train, Epoch 5 / 20:  87%|████████▋ | 1353/1563 [00:56<00:09, 21.93it/s]

batch 1350 loss: 0.3030782401561737


Train, Epoch 5 / 20:  87%|████████▋ | 1362/1563 [00:56<00:08, 23.20it/s]

batch 1360 loss: 0.29420264065265656


Train, Epoch 5 / 20:  88%|████████▊ | 1374/1563 [00:57<00:07, 24.05it/s]

batch 1370 loss: 0.2376498721539974


Train, Epoch 5 / 20:  88%|████████▊ | 1383/1563 [00:57<00:07, 24.50it/s]

batch 1380 loss: 0.31122518330812454


Train, Epoch 5 / 20:  89%|████████▉ | 1395/1563 [00:57<00:06, 24.64it/s]

batch 1390 loss: 0.26919866278767585


Train, Epoch 5 / 20:  90%|████████▉ | 1404/1563 [00:58<00:06, 24.78it/s]

batch 1400 loss: 0.2482169397175312


Train, Epoch 5 / 20:  90%|█████████ | 1413/1563 [00:58<00:06, 24.59it/s]

batch 1410 loss: 0.31761838421225547


Train, Epoch 5 / 20:  91%|█████████ | 1422/1563 [00:59<00:05, 24.79it/s]

batch 1420 loss: 0.2680266387760639


Train, Epoch 5 / 20:  92%|█████████▏| 1434/1563 [00:59<00:05, 24.19it/s]

batch 1430 loss: 0.2932811349630356


Train, Epoch 5 / 20:  92%|█████████▏| 1443/1563 [00:59<00:04, 24.11it/s]

batch 1440 loss: 0.2927192270755768


Train, Epoch 5 / 20:  93%|█████████▎| 1455/1563 [01:00<00:04, 24.70it/s]

batch 1450 loss: 0.3430739678442478


Train, Epoch 5 / 20:  94%|█████████▎| 1464/1563 [01:00<00:03, 24.80it/s]

batch 1460 loss: 0.38746184557676316


Train, Epoch 5 / 20:  94%|█████████▍| 1473/1563 [01:01<00:03, 24.78it/s]

batch 1470 loss: 0.27234792932868


Train, Epoch 5 / 20:  95%|█████████▌| 1485/1563 [01:01<00:03, 24.69it/s]

batch 1480 loss: 0.21904212087392808


Train, Epoch 5 / 20:  96%|█████████▌| 1494/1563 [01:01<00:02, 24.84it/s]

batch 1490 loss: 0.30004407465457916


Train, Epoch 5 / 20:  96%|█████████▌| 1503/1563 [01:02<00:02, 24.69it/s]

batch 1500 loss: 0.21924312710762023


Train, Epoch 5 / 20:  97%|█████████▋| 1515/1563 [01:02<00:01, 24.64it/s]

batch 1510 loss: 0.26551644653081896


Train, Epoch 5 / 20:  98%|█████████▊| 1524/1563 [01:03<00:01, 24.66it/s]

batch 1520 loss: 0.3415455982089043


Train, Epoch 5 / 20:  98%|█████████▊| 1533/1563 [01:03<00:01, 24.42it/s]

batch 1530 loss: 0.2928279981017113


Train, Epoch 5 / 20:  99%|█████████▉| 1545/1563 [01:04<00:00, 24.89it/s]

batch 1540 loss: 0.31177197247743604


Train, Epoch 5 / 20:  99%|█████████▉| 1554/1563 [01:04<00:00, 24.75it/s]

batch 1550 loss: 0.27197555974125864


Train, Epoch 5 / 20: 100%|██████████| 1563/1563 [01:04<00:00, 24.14it/s]


batch 1560 loss: 0.3207321234047413


Test, Epoch 5 / 20: 100%|██████████| 1563/1563 [00:29<00:00, 52.20it/s]


Epoch 5, loss: 0.47918347507089376, accuracy: 0.7998


Train, Epoch 6 / 20:   1%|          | 15/1563 [00:00<01:01, 25.07it/s]

batch 10 loss: 0.36503555700182916


Train, Epoch 6 / 20:   2%|▏         | 24/1563 [00:00<01:01, 24.95it/s]

batch 20 loss: 0.3430963769555092


Train, Epoch 6 / 20:   2%|▏         | 33/1563 [00:01<01:01, 24.75it/s]

batch 30 loss: 0.3289932146668434


Train, Epoch 6 / 20:   3%|▎         | 42/1563 [00:01<01:01, 24.72it/s]

batch 40 loss: 0.2889015942811966


Train, Epoch 6 / 20:   3%|▎         | 54/1563 [00:02<01:00, 24.74it/s]

batch 50 loss: 0.20674899518489837


Train, Epoch 6 / 20:   4%|▍         | 63/1563 [00:02<01:00, 24.96it/s]

batch 60 loss: 0.3097059324383736


Train, Epoch 6 / 20:   5%|▍         | 75/1563 [00:03<00:59, 24.87it/s]

batch 70 loss: 0.2664672151207924


Train, Epoch 6 / 20:   5%|▌         | 84/1563 [00:03<00:59, 24.78it/s]

batch 80 loss: 0.29574511498212813


Train, Epoch 6 / 20:   6%|▌         | 93/1563 [00:03<00:59, 24.54it/s]

batch 90 loss: 0.23565436154603958


Train, Epoch 6 / 20:   7%|▋         | 105/1563 [00:04<00:58, 24.83it/s]

batch 100 loss: 0.31139546222984793


Train, Epoch 6 / 20:   7%|▋         | 114/1563 [00:04<00:58, 24.85it/s]

batch 110 loss: 0.20345909409224988


Train, Epoch 6 / 20:   8%|▊         | 123/1563 [00:04<00:58, 24.63it/s]

batch 120 loss: 0.2929609313607216


Train, Epoch 6 / 20:   9%|▊         | 135/1563 [00:05<00:57, 24.83it/s]

batch 130 loss: 0.3258093997836113


Train, Epoch 6 / 20:   9%|▉         | 144/1563 [00:05<00:57, 24.70it/s]

batch 140 loss: 0.28493274599313734


Train, Epoch 6 / 20:  10%|▉         | 153/1563 [00:06<00:57, 24.67it/s]

batch 150 loss: 0.3543813914060593


Train, Epoch 6 / 20:  11%|█         | 165/1563 [00:06<00:56, 24.70it/s]

batch 160 loss: 0.23621279150247573


Train, Epoch 6 / 20:  11%|█         | 174/1563 [00:07<00:55, 24.82it/s]

batch 170 loss: 0.2905553959310055


Train, Epoch 6 / 20:  12%|█▏        | 183/1563 [00:07<00:55, 24.71it/s]

batch 180 loss: 0.3042382910847664


Train, Epoch 6 / 20:  12%|█▏        | 192/1563 [00:07<00:55, 24.76it/s]

batch 190 loss: 0.3385286696255207


Train, Epoch 6 / 20:  13%|█▎        | 204/1563 [00:08<00:57, 23.63it/s]

batch 200 loss: 0.27740020006895066


Train, Epoch 6 / 20:  14%|█▎        | 213/1563 [00:08<00:59, 22.53it/s]

batch 210 loss: 0.26568014845252036


Train, Epoch 6 / 20:  14%|█▍        | 222/1563 [00:09<01:01, 21.64it/s]

batch 220 loss: 0.3116891779005527


Train, Epoch 6 / 20:  15%|█▍        | 234/1563 [00:09<01:00, 21.80it/s]

batch 230 loss: 0.32294399067759516


Train, Epoch 6 / 20:  16%|█▌        | 243/1563 [00:10<00:59, 22.03it/s]

batch 240 loss: 0.2443590134382248


Train, Epoch 6 / 20:  16%|█▋        | 255/1563 [00:10<00:56, 23.21it/s]

batch 250 loss: 0.26128740459680555


Train, Epoch 6 / 20:  17%|█▋        | 264/1563 [00:10<00:54, 24.01it/s]

batch 260 loss: 0.41815088391304017


Train, Epoch 6 / 20:  17%|█▋        | 273/1563 [00:11<00:52, 24.60it/s]

batch 270 loss: 0.2887952871620655


Train, Epoch 6 / 20:  18%|█▊        | 285/1563 [00:11<00:51, 24.72it/s]

batch 280 loss: 0.35639889538288116


Train, Epoch 6 / 20:  19%|█▉        | 294/1563 [00:12<00:51, 24.67it/s]

batch 290 loss: 0.24343771040439605


Train, Epoch 6 / 20:  19%|█▉        | 303/1563 [00:12<00:51, 24.70it/s]

batch 300 loss: 0.3069209173321724


Train, Epoch 6 / 20:  20%|██        | 315/1563 [00:13<00:50, 24.61it/s]

batch 310 loss: 0.3103819265961647


Train, Epoch 6 / 20:  21%|██        | 324/1563 [00:13<00:50, 24.74it/s]

batch 320 loss: 0.36144607663154604


Train, Epoch 6 / 20:  21%|██▏       | 333/1563 [00:13<00:49, 24.82it/s]

batch 330 loss: 0.2558197885751724


Train, Epoch 6 / 20:  22%|██▏       | 345/1563 [00:14<00:49, 24.71it/s]

batch 340 loss: 0.2547172661870718


Train, Epoch 6 / 20:  23%|██▎       | 354/1563 [00:14<00:48, 24.93it/s]

batch 350 loss: 0.2964706264436245


Train, Epoch 6 / 20:  23%|██▎       | 363/1563 [00:14<00:48, 24.72it/s]

batch 360 loss: 0.33198623210191724


Train, Epoch 6 / 20:  24%|██▍       | 375/1563 [00:15<00:47, 24.80it/s]

batch 370 loss: 0.31455814093351364


Train, Epoch 6 / 20:  25%|██▍       | 384/1563 [00:15<00:48, 24.51it/s]

batch 380 loss: 0.30727955549955366


Train, Epoch 6 / 20:  25%|██▌       | 393/1563 [00:16<00:48, 24.13it/s]

batch 390 loss: 0.3256685361266136


Train, Epoch 6 / 20:  26%|██▌       | 405/1563 [00:16<00:47, 24.52it/s]

batch 400 loss: 0.3688541904091835


Train, Epoch 6 / 20:  26%|██▋       | 414/1563 [00:17<00:46, 24.48it/s]

batch 410 loss: 0.30787878334522245


Train, Epoch 6 / 20:  27%|██▋       | 423/1563 [00:17<00:46, 24.70it/s]

batch 420 loss: 0.20714360177516938


Train, Epoch 6 / 20:  28%|██▊       | 435/1563 [00:17<00:45, 24.86it/s]

batch 430 loss: 0.33905299976468084


Train, Epoch 6 / 20:  28%|██▊       | 444/1563 [00:18<00:45, 24.86it/s]

batch 440 loss: 0.2991557314991951


Train, Epoch 6 / 20:  29%|██▉       | 453/1563 [00:18<00:44, 24.80it/s]

batch 450 loss: 0.3663825377821922


Train, Epoch 6 / 20:  30%|██▉       | 465/1563 [00:19<00:44, 24.91it/s]

batch 460 loss: 0.34468822479248046


Train, Epoch 6 / 20:  30%|███       | 474/1563 [00:19<00:44, 24.64it/s]

batch 470 loss: 0.314106909930706


Train, Epoch 6 / 20:  31%|███       | 483/1563 [00:19<00:44, 24.44it/s]

batch 480 loss: 0.327966645359993


Train, Epoch 6 / 20:  31%|███▏      | 492/1563 [00:20<00:44, 24.01it/s]

batch 490 loss: 0.3505388587713242


Train, Epoch 6 / 20:  32%|███▏      | 504/1563 [00:20<00:47, 22.26it/s]

batch 500 loss: 0.2810567244887352


Train, Epoch 6 / 20:  33%|███▎      | 513/1563 [00:21<00:47, 21.92it/s]

batch 510 loss: 0.2950515031814575


Train, Epoch 6 / 20:  33%|███▎      | 522/1563 [00:21<00:48, 21.49it/s]

batch 520 loss: 0.34355092197656634


Train, Epoch 6 / 20:  34%|███▍      | 534/1563 [00:22<00:48, 21.31it/s]

batch 530 loss: 0.31246894747018816


Train, Epoch 6 / 20:  35%|███▍      | 543/1563 [00:22<00:47, 21.57it/s]

batch 540 loss: 0.33092299550771714


Train, Epoch 6 / 20:  36%|███▌      | 555/1563 [00:23<00:42, 23.80it/s]

batch 550 loss: 0.26048133596777917


Train, Epoch 6 / 20:  36%|███▌      | 564/1563 [00:23<00:41, 24.21it/s]

batch 560 loss: 0.31674937456846236


Train, Epoch 6 / 20:  37%|███▋      | 573/1563 [00:23<00:40, 24.70it/s]

batch 570 loss: 0.26705724000930786


Train, Epoch 6 / 20:  37%|███▋      | 585/1563 [00:24<00:39, 24.74it/s]

batch 580 loss: 0.28443752974271774


Train, Epoch 6 / 20:  38%|███▊      | 594/1563 [00:24<00:39, 24.69it/s]

batch 590 loss: 0.20803405493497848


Train, Epoch 6 / 20:  39%|███▊      | 603/1563 [00:25<00:38, 24.79it/s]

batch 600 loss: 0.3531188353896141


Train, Epoch 6 / 20:  39%|███▉      | 612/1563 [00:25<00:39, 24.38it/s]

batch 610 loss: 0.2500558033585548


Train, Epoch 6 / 20:  40%|███▉      | 624/1563 [00:25<00:37, 24.77it/s]

batch 620 loss: 0.2833209328353405


Train, Epoch 6 / 20:  40%|████      | 633/1563 [00:26<00:37, 24.89it/s]

batch 630 loss: 0.24932217374444007


Train, Epoch 6 / 20:  41%|████▏     | 645/1563 [00:26<00:37, 24.79it/s]

batch 640 loss: 0.30017087012529375


Train, Epoch 6 / 20:  42%|████▏     | 654/1563 [00:27<00:36, 24.80it/s]

batch 650 loss: 0.3174867078661919


Train, Epoch 6 / 20:  42%|████▏     | 663/1563 [00:27<00:36, 24.80it/s]

batch 660 loss: 0.18307051286101342


Train, Epoch 6 / 20:  43%|████▎     | 675/1563 [00:27<00:36, 24.64it/s]

batch 670 loss: 0.31871424093842504


Train, Epoch 6 / 20:  44%|████▍     | 684/1563 [00:28<00:35, 24.55it/s]

batch 680 loss: 0.325355364382267


Train, Epoch 6 / 20:  44%|████▍     | 693/1563 [00:28<00:35, 24.67it/s]

batch 690 loss: 0.29660009890794753


Train, Epoch 6 / 20:  45%|████▌     | 705/1563 [00:29<00:34, 24.74it/s]

batch 700 loss: 0.260836823284626


Train, Epoch 6 / 20:  46%|████▌     | 714/1563 [00:29<00:34, 24.53it/s]

batch 710 loss: 0.28197581097483637


Train, Epoch 6 / 20:  46%|████▋     | 723/1563 [00:29<00:33, 24.76it/s]

batch 720 loss: 0.2748258411884308


Train, Epoch 6 / 20:  47%|████▋     | 735/1563 [00:30<00:33, 24.75it/s]

batch 730 loss: 0.3199464127421379


Train, Epoch 6 / 20:  48%|████▊     | 744/1563 [00:30<00:33, 24.75it/s]

batch 740 loss: 0.32542566359043124


Train, Epoch 6 / 20:  48%|████▊     | 753/1563 [00:31<00:32, 24.77it/s]

batch 750 loss: 0.2782344803214073


Train, Epoch 6 / 20:  49%|████▉     | 762/1563 [00:31<00:32, 24.86it/s]

batch 760 loss: 0.3538550496101379


Train, Epoch 6 / 20:  50%|████▉     | 774/1563 [00:32<00:31, 24.69it/s]

batch 770 loss: 0.2646300673484802


Train, Epoch 6 / 20:  50%|█████     | 783/1563 [00:32<00:31, 24.80it/s]

batch 780 loss: 0.2619709812104702


Train, Epoch 6 / 20:  51%|█████     | 792/1563 [00:32<00:32, 23.66it/s]

batch 790 loss: 0.31279917657375333


Train, Epoch 6 / 20:  51%|█████▏    | 804/1563 [00:33<00:34, 22.03it/s]

batch 800 loss: 0.3414975255727768


Train, Epoch 6 / 20:  52%|█████▏    | 813/1563 [00:33<00:34, 21.59it/s]

batch 810 loss: 0.3006595954298973


Train, Epoch 6 / 20:  53%|█████▎    | 822/1563 [00:34<00:34, 21.70it/s]

batch 820 loss: 0.2559140115976334


Train, Epoch 6 / 20:  53%|█████▎    | 834/1563 [00:34<00:33, 22.06it/s]

batch 830 loss: 0.3436496153473854


Train, Epoch 6 / 20:  54%|█████▍    | 843/1563 [00:35<00:30, 23.83it/s]

batch 840 loss: 0.40921918898820875


Train, Epoch 6 / 20:  55%|█████▍    | 855/1563 [00:35<00:28, 24.74it/s]

batch 850 loss: 0.20553693622350694


Train, Epoch 6 / 20:  55%|█████▌    | 864/1563 [00:35<00:28, 24.65it/s]

batch 860 loss: 0.33947386369109156


Train, Epoch 6 / 20:  56%|█████▌    | 873/1563 [00:36<00:27, 24.65it/s]

batch 870 loss: 0.27823693715035913


Train, Epoch 6 / 20:  57%|█████▋    | 885/1563 [00:36<00:27, 24.59it/s]

batch 880 loss: 0.21651579365134238


Train, Epoch 6 / 20:  57%|█████▋    | 894/1563 [00:37<00:27, 24.59it/s]

batch 890 loss: 0.29292048811912536


Train, Epoch 6 / 20:  58%|█████▊    | 903/1563 [00:37<00:26, 24.70it/s]

batch 900 loss: 0.28303593397140503


Train, Epoch 6 / 20:  59%|█████▊    | 915/1563 [00:37<00:26, 24.69it/s]

batch 910 loss: 0.37073012739419936


Train, Epoch 6 / 20:  59%|█████▉    | 924/1563 [00:38<00:25, 24.64it/s]

batch 920 loss: 0.2790824696421623


Train, Epoch 6 / 20:  60%|█████▉    | 933/1563 [00:38<00:25, 24.84it/s]

batch 930 loss: 0.299704073369503


Train, Epoch 6 / 20:  60%|██████    | 945/1563 [00:39<00:24, 24.78it/s]

batch 940 loss: 0.2842838287353516


Train, Epoch 6 / 20:  61%|██████    | 954/1563 [00:39<00:24, 24.80it/s]

batch 950 loss: 0.2467070996761322


Train, Epoch 6 / 20:  62%|██████▏   | 963/1563 [00:39<00:24, 24.63it/s]

batch 960 loss: 0.30672639757394793


Train, Epoch 6 / 20:  62%|██████▏   | 972/1563 [00:40<00:24, 24.47it/s]

batch 970 loss: 0.31301895081996917


Train, Epoch 6 / 20:  63%|██████▎   | 984/1563 [00:40<00:23, 24.84it/s]

batch 980 loss: 0.2961768038570881


Train, Epoch 6 / 20:  64%|██████▎   | 993/1563 [00:41<00:23, 24.59it/s]

batch 990 loss: 0.29487083852291107


Train, Epoch 6 / 20:  64%|██████▍   | 1005/1563 [00:41<00:22, 24.77it/s]

batch 1000 loss: 0.3330188512802124


Train, Epoch 6 / 20:  65%|██████▍   | 1014/1563 [00:42<00:22, 24.87it/s]

batch 1010 loss: 0.27772290632128716


Train, Epoch 6 / 20:  65%|██████▌   | 1023/1563 [00:42<00:21, 24.79it/s]

batch 1020 loss: 0.273397858440876


Train, Epoch 6 / 20:  66%|██████▌   | 1035/1563 [00:42<00:21, 24.70it/s]

batch 1030 loss: 0.3391480639576912


Train, Epoch 6 / 20:  67%|██████▋   | 1044/1563 [00:43<00:21, 24.69it/s]

batch 1040 loss: 0.26569068878889085


Train, Epoch 6 / 20:  67%|██████▋   | 1053/1563 [00:43<00:20, 24.32it/s]

batch 1050 loss: 0.2520316794514656


Train, Epoch 6 / 20:  68%|██████▊   | 1065/1563 [00:44<00:20, 24.66it/s]

batch 1060 loss: 0.27831833213567736


Train, Epoch 6 / 20:  69%|██████▊   | 1074/1563 [00:44<00:19, 24.93it/s]

batch 1070 loss: 0.368662565946579


Train, Epoch 6 / 20:  69%|██████▉   | 1083/1563 [00:44<00:19, 24.45it/s]

batch 1080 loss: 0.2897850573062897


Train, Epoch 6 / 20:  70%|██████▉   | 1092/1563 [00:45<00:21, 22.41it/s]

batch 1090 loss: 0.3815313816070557


Train, Epoch 6 / 20:  71%|███████   | 1104/1563 [00:45<00:20, 22.14it/s]

batch 1100 loss: 0.2597869120538235


Train, Epoch 6 / 20:  71%|███████   | 1113/1563 [00:46<00:20, 22.34it/s]

batch 1110 loss: 0.24663862437009812


Train, Epoch 6 / 20:  72%|███████▏  | 1122/1563 [00:46<00:20, 21.89it/s]

batch 1120 loss: 0.29664234817028046


Train, Epoch 6 / 20:  73%|███████▎  | 1134/1563 [00:47<00:19, 22.36it/s]

batch 1130 loss: 0.28651567697525027


Train, Epoch 6 / 20:  73%|███████▎  | 1143/1563 [00:47<00:17, 23.88it/s]

batch 1140 loss: 0.32006255686283114


Train, Epoch 6 / 20:  74%|███████▍  | 1155/1563 [00:47<00:16, 24.68it/s]

batch 1150 loss: 0.3032817155122757


Train, Epoch 6 / 20:  74%|███████▍  | 1164/1563 [00:48<00:16, 24.37it/s]

batch 1160 loss: 0.36473026499152184


Train, Epoch 6 / 20:  75%|███████▌  | 1173/1563 [00:48<00:15, 24.52it/s]

batch 1170 loss: 0.26217208579182627


Train, Epoch 6 / 20:  76%|███████▌  | 1185/1563 [00:49<00:15, 24.69it/s]

batch 1180 loss: 0.26988294124603274


Train, Epoch 6 / 20:  76%|███████▋  | 1194/1563 [00:49<00:14, 24.92it/s]

batch 1190 loss: 0.39414617121219636


Train, Epoch 6 / 20:  77%|███████▋  | 1203/1563 [00:49<00:14, 24.48it/s]

batch 1200 loss: 0.31314236894249914


Train, Epoch 6 / 20:  78%|███████▊  | 1215/1563 [00:50<00:14, 24.57it/s]

batch 1210 loss: 0.2695547342300415


Train, Epoch 6 / 20:  78%|███████▊  | 1224/1563 [00:50<00:13, 24.41it/s]

batch 1220 loss: 0.3624474681913853


Train, Epoch 6 / 20:  79%|███████▉  | 1233/1563 [00:51<00:13, 24.06it/s]

batch 1230 loss: 0.37459849417209623


Train, Epoch 6 / 20:  80%|███████▉  | 1245/1563 [00:51<00:13, 24.43it/s]

batch 1240 loss: 0.27582675963640213


Train, Epoch 6 / 20:  80%|████████  | 1254/1563 [00:52<00:12, 24.67it/s]

batch 1250 loss: 0.28296408653259275


Train, Epoch 6 / 20:  81%|████████  | 1263/1563 [00:52<00:12, 24.31it/s]

batch 1260 loss: 0.24688778072595596


Train, Epoch 6 / 20:  82%|████████▏ | 1275/1563 [00:52<00:11, 24.84it/s]

batch 1270 loss: 0.3046261578798294


Train, Epoch 6 / 20:  82%|████████▏ | 1284/1563 [00:53<00:11, 24.54it/s]

batch 1280 loss: 0.3397088721394539


Train, Epoch 6 / 20:  83%|████████▎ | 1293/1563 [00:53<00:10, 24.85it/s]

batch 1290 loss: 0.3309257224202156


Train, Epoch 6 / 20:  83%|████████▎ | 1305/1563 [00:54<00:10, 24.67it/s]

batch 1300 loss: 0.2603126995265484


Train, Epoch 6 / 20:  84%|████████▍ | 1314/1563 [00:54<00:10, 24.61it/s]

batch 1310 loss: 0.3306896224617958


Train, Epoch 6 / 20:  85%|████████▍ | 1323/1563 [00:54<00:09, 24.79it/s]

batch 1320 loss: 0.2787975698709488


Train, Epoch 6 / 20:  85%|████████▌ | 1332/1563 [00:55<00:09, 24.58it/s]

batch 1330 loss: 0.25253442078828814


Train, Epoch 6 / 20:  86%|████████▌ | 1344/1563 [00:55<00:08, 24.76it/s]

batch 1340 loss: 0.28775637298822404


Train, Epoch 6 / 20:  87%|████████▋ | 1353/1563 [00:56<00:08, 24.85it/s]

batch 1350 loss: 0.3046144306659698


Train, Epoch 6 / 20:  87%|████████▋ | 1365/1563 [00:56<00:07, 24.75it/s]

batch 1360 loss: 0.24170757085084915


Train, Epoch 6 / 20:  88%|████████▊ | 1374/1563 [00:56<00:07, 24.63it/s]

batch 1370 loss: 0.4069571107625961


Train, Epoch 6 / 20:  88%|████████▊ | 1383/1563 [00:57<00:07, 23.05it/s]

batch 1380 loss: 0.3246632143855095


Train, Epoch 6 / 20:  89%|████████▉ | 1392/1563 [00:57<00:07, 22.09it/s]

batch 1390 loss: 0.31265023574233053


Train, Epoch 6 / 20:  90%|████████▉ | 1404/1563 [00:58<00:07, 22.32it/s]

batch 1400 loss: 0.33759343773126604


Train, Epoch 6 / 20:  90%|█████████ | 1413/1563 [00:58<00:06, 21.93it/s]

batch 1410 loss: 0.32182937785983085


Train, Epoch 6 / 20:  91%|█████████ | 1422/1563 [00:59<00:06, 21.80it/s]

batch 1420 loss: 0.2546224519610405


Train, Epoch 6 / 20:  92%|█████████▏| 1434/1563 [00:59<00:05, 23.41it/s]

batch 1430 loss: 0.2929899573326111


Train, Epoch 6 / 20:  92%|█████████▏| 1443/1563 [00:59<00:04, 24.39it/s]

batch 1440 loss: 0.33577715158462523


Train, Epoch 6 / 20:  93%|█████████▎| 1452/1563 [01:00<00:04, 24.61it/s]

batch 1450 loss: 0.3031162187457085


Train, Epoch 6 / 20:  94%|█████████▎| 1464/1563 [01:00<00:04, 24.74it/s]

batch 1460 loss: 0.2714853294193745


Train, Epoch 6 / 20:  94%|█████████▍| 1473/1563 [01:01<00:03, 24.83it/s]

batch 1470 loss: 0.2779060110449791


Train, Epoch 6 / 20:  95%|█████████▌| 1485/1563 [01:01<00:03, 24.60it/s]

batch 1480 loss: 0.2884685754776001


Train, Epoch 6 / 20:  96%|█████████▌| 1494/1563 [01:02<00:02, 24.77it/s]

batch 1490 loss: 0.2940255731344223


Train, Epoch 6 / 20:  96%|█████████▌| 1503/1563 [01:02<00:02, 24.87it/s]

batch 1500 loss: 0.259195414185524


Train, Epoch 6 / 20:  97%|█████████▋| 1515/1563 [01:02<00:01, 24.45it/s]

batch 1510 loss: 0.30161636620759963


Train, Epoch 6 / 20:  98%|█████████▊| 1524/1563 [01:03<00:01, 24.57it/s]

batch 1520 loss: 0.3354623816907406


Train, Epoch 6 / 20:  98%|█████████▊| 1533/1563 [01:03<00:01, 24.42it/s]

batch 1530 loss: 0.3686626642942429


Train, Epoch 6 / 20:  99%|█████████▉| 1545/1563 [01:04<00:00, 24.51it/s]

batch 1540 loss: 0.3325867630541325


Train, Epoch 6 / 20:  99%|█████████▉| 1554/1563 [01:04<00:00, 24.88it/s]

batch 1550 loss: 0.2659283883869648


Train, Epoch 6 / 20: 100%|██████████| 1563/1563 [01:04<00:00, 24.11it/s]


batch 1560 loss: 0.24656369760632516


Test, Epoch 6 / 20: 100%|██████████| 1563/1563 [00:29<00:00, 52.45it/s]


Epoch 6, loss: 0.45003128020226957, accuracy: 0.806


Train, Epoch 7 / 20:   1%|          | 12/1563 [00:00<01:07, 22.85it/s]

batch 10 loss: 0.25142631232738494


Train, Epoch 7 / 20:   2%|▏         | 24/1563 [00:01<01:10, 21.79it/s]

batch 20 loss: 0.27888473123311996


Train, Epoch 7 / 20:   2%|▏         | 33/1563 [00:01<01:12, 21.17it/s]

batch 30 loss: 0.3279791191220284


Train, Epoch 7 / 20:   3%|▎         | 42/1563 [00:01<01:06, 22.95it/s]

batch 40 loss: 0.2646473072469234


Train, Epoch 7 / 20:   3%|▎         | 54/1563 [00:02<01:01, 24.34it/s]

batch 50 loss: 0.2319739505648613


Train, Epoch 7 / 20:   4%|▍         | 63/1563 [00:02<01:00, 24.68it/s]

batch 60 loss: 0.3021840050816536


Train, Epoch 7 / 20:   5%|▍         | 75/1563 [00:03<01:00, 24.67it/s]

batch 70 loss: 0.27900908663868906


Train, Epoch 7 / 20:   5%|▌         | 84/1563 [00:03<00:59, 24.81it/s]

batch 80 loss: 0.3649205848574638


Train, Epoch 7 / 20:   6%|▌         | 93/1563 [00:03<01:00, 24.46it/s]

batch 90 loss: 0.26554659456014634


Train, Epoch 7 / 20:   7%|▋         | 105/1563 [00:04<00:58, 24.81it/s]

batch 100 loss: 0.307712721824646


Train, Epoch 7 / 20:   7%|▋         | 114/1563 [00:04<00:58, 24.81it/s]

batch 110 loss: 0.24651936292648316


Train, Epoch 7 / 20:   8%|▊         | 123/1563 [00:05<00:58, 24.42it/s]

batch 120 loss: 0.3741797834634781


Train, Epoch 7 / 20:   9%|▊         | 135/1563 [00:05<00:57, 24.63it/s]

batch 130 loss: 0.2262973114848137


Train, Epoch 7 / 20:   9%|▉         | 144/1563 [00:06<00:57, 24.62it/s]

batch 140 loss: 0.1766423612833023


Train, Epoch 7 / 20:  10%|▉         | 153/1563 [00:06<00:57, 24.63it/s]

batch 150 loss: 0.2256990723311901


Train, Epoch 7 / 20:  11%|█         | 165/1563 [00:06<00:57, 24.47it/s]

batch 160 loss: 0.23031701296567916


Train, Epoch 7 / 20:  11%|█         | 174/1563 [00:07<00:56, 24.62it/s]

batch 170 loss: 0.2803169645369053


Train, Epoch 7 / 20:  12%|█▏        | 183/1563 [00:07<00:55, 24.67it/s]

batch 180 loss: 0.33911608904600143


Train, Epoch 7 / 20:  12%|█▏        | 195/1563 [00:08<00:55, 24.63it/s]

batch 190 loss: 0.45188664495944975


Train, Epoch 7 / 20:  13%|█▎        | 204/1563 [00:08<00:55, 24.61it/s]

batch 200 loss: 0.23562025874853135


Train, Epoch 7 / 20:  14%|█▎        | 213/1563 [00:08<00:54, 24.63it/s]

batch 210 loss: 0.2772284775972366


Train, Epoch 7 / 20:  14%|█▍        | 225/1563 [00:09<00:54, 24.71it/s]

batch 220 loss: 0.27530478686094284


Train, Epoch 7 / 20:  15%|█▍        | 234/1563 [00:09<00:53, 24.76it/s]

batch 230 loss: 0.26804143786430357


Train, Epoch 7 / 20:  16%|█▌        | 243/1563 [00:10<00:53, 24.68it/s]

batch 240 loss: 0.3349872440099716


Train, Epoch 7 / 20:  16%|█▋        | 255/1563 [00:10<00:52, 24.72it/s]

batch 250 loss: 0.217095347866416


Train, Epoch 7 / 20:  17%|█▋        | 264/1563 [00:10<00:52, 24.73it/s]

batch 260 loss: 0.2996935546398163


Train, Epoch 7 / 20:  17%|█▋        | 273/1563 [00:11<00:52, 24.41it/s]

batch 270 loss: 0.3807292848825455


Train, Epoch 7 / 20:  18%|█▊        | 282/1563 [00:11<00:53, 23.89it/s]

batch 280 loss: 0.26713343188166616


Train, Epoch 7 / 20:  19%|█▉        | 294/1563 [00:12<00:57, 22.02it/s]

batch 290 loss: 0.2618698880076408


Train, Epoch 7 / 20:  19%|█▉        | 303/1563 [00:12<00:58, 21.70it/s]

batch 300 loss: 0.2810028210282326


Train, Epoch 7 / 20:  20%|█▉        | 312/1563 [00:13<00:56, 22.00it/s]

batch 310 loss: 0.3353216364979744


Train, Epoch 7 / 20:  21%|██        | 324/1563 [00:13<00:58, 21.34it/s]

batch 320 loss: 0.260188952088356


Train, Epoch 7 / 20:  21%|██▏       | 333/1563 [00:14<00:54, 22.46it/s]

batch 330 loss: 0.37383167147636415


Train, Epoch 7 / 20:  22%|██▏       | 345/1563 [00:14<00:50, 24.15it/s]

batch 340 loss: 0.30757108628749846


Train, Epoch 7 / 20:  23%|██▎       | 354/1563 [00:14<00:48, 24.74it/s]

batch 350 loss: 0.25516103990375993


Train, Epoch 7 / 20:  23%|██▎       | 363/1563 [00:15<00:48, 24.62it/s]

batch 360 loss: 0.23190804421901703


Train, Epoch 7 / 20:  24%|██▍       | 372/1563 [00:15<00:48, 24.45it/s]

batch 370 loss: 0.29571177139878274


Train, Epoch 7 / 20:  25%|██▍       | 384/1563 [00:16<00:47, 24.86it/s]

batch 380 loss: 0.23357662819325925


Train, Epoch 7 / 20:  25%|██▌       | 393/1563 [00:16<00:47, 24.63it/s]

batch 390 loss: 0.35235499292612077


Train, Epoch 7 / 20:  26%|██▌       | 405/1563 [00:16<00:47, 24.50it/s]

batch 400 loss: 0.3296806335449219


Train, Epoch 7 / 20:  26%|██▋       | 414/1563 [00:17<00:46, 24.74it/s]

batch 410 loss: 0.27221590355038644


Train, Epoch 7 / 20:  27%|██▋       | 423/1563 [00:17<00:46, 24.75it/s]

batch 420 loss: 0.2738885171711445


Train, Epoch 7 / 20:  28%|██▊       | 432/1563 [00:18<00:46, 24.36it/s]

batch 430 loss: 0.3546203263103962


Train, Epoch 7 / 20:  28%|██▊       | 444/1563 [00:18<00:45, 24.36it/s]

batch 440 loss: 0.3685740627348423


Train, Epoch 7 / 20:  29%|██▉       | 453/1563 [00:18<00:44, 24.67it/s]

batch 450 loss: 0.24685775637626647


Train, Epoch 7 / 20:  30%|██▉       | 465/1563 [00:19<00:44, 24.47it/s]

batch 460 loss: 0.27296764552593233


Train, Epoch 7 / 20:  30%|███       | 474/1563 [00:19<00:44, 24.61it/s]

batch 470 loss: 0.32466202452778814


Train, Epoch 7 / 20:  31%|███       | 483/1563 [00:20<00:43, 24.74it/s]

batch 480 loss: 0.24024855345487595


Train, Epoch 7 / 20:  32%|███▏      | 495/1563 [00:20<00:42, 24.86it/s]

batch 490 loss: 0.2289329446852207


Train, Epoch 7 / 20:  32%|███▏      | 504/1563 [00:20<00:42, 24.65it/s]

batch 500 loss: 0.29635044634342195


Train, Epoch 7 / 20:  33%|███▎      | 513/1563 [00:21<00:42, 24.53it/s]

batch 510 loss: 0.2891277506947517


Train, Epoch 7 / 20:  34%|███▎      | 525/1563 [00:21<00:42, 24.63it/s]

batch 520 loss: 0.3364119790494442


Train, Epoch 7 / 20:  34%|███▍      | 534/1563 [00:22<00:41, 24.84it/s]

batch 530 loss: 0.2720355004072189


Train, Epoch 7 / 20:  35%|███▍      | 543/1563 [00:22<00:41, 24.62it/s]

batch 540 loss: 0.27434066906571386


Train, Epoch 7 / 20:  36%|███▌      | 555/1563 [00:23<00:40, 24.84it/s]

batch 550 loss: 0.23378657288849353


Train, Epoch 7 / 20:  36%|███▌      | 564/1563 [00:23<00:40, 24.84it/s]

batch 560 loss: 0.3224261112511158


Train, Epoch 7 / 20:  37%|███▋      | 573/1563 [00:23<00:40, 24.55it/s]

batch 570 loss: 0.30980801954865456


Train, Epoch 7 / 20:  37%|███▋      | 582/1563 [00:24<00:42, 23.23it/s]

batch 580 loss: 0.3012402817606926


Train, Epoch 7 / 20:  38%|███▊      | 594/1563 [00:24<00:45, 21.32it/s]

batch 590 loss: 0.2630386658012867


Train, Epoch 7 / 20:  39%|███▊      | 603/1563 [00:25<00:44, 21.50it/s]

batch 600 loss: 0.2740949459373951


Train, Epoch 7 / 20:  39%|███▉      | 612/1563 [00:25<00:44, 21.47it/s]

batch 610 loss: 0.27282461524009705


Train, Epoch 7 / 20:  40%|███▉      | 624/1563 [00:26<00:42, 21.98it/s]

batch 620 loss: 0.2900268159806728


Train, Epoch 7 / 20:  40%|████      | 633/1563 [00:26<00:39, 23.60it/s]

batch 630 loss: 0.3054504066705704


Train, Epoch 7 / 20:  41%|████▏     | 645/1563 [00:26<00:37, 24.26it/s]

batch 640 loss: 0.3139794781804085


Train, Epoch 7 / 20:  42%|████▏     | 654/1563 [00:27<00:36, 24.67it/s]

batch 650 loss: 0.24160506017506123


Train, Epoch 7 / 20:  42%|████▏     | 663/1563 [00:27<00:36, 24.53it/s]

batch 660 loss: 0.3273476079106331


Train, Epoch 7 / 20:  43%|████▎     | 675/1563 [00:28<00:35, 24.85it/s]

batch 670 loss: 0.2723852418363094


Train, Epoch 7 / 20:  44%|████▍     | 684/1563 [00:28<00:35, 24.76it/s]

batch 680 loss: 0.24627701044082642


Train, Epoch 7 / 20:  44%|████▍     | 693/1563 [00:28<00:35, 24.66it/s]

batch 690 loss: 0.2852532431483269


Train, Epoch 7 / 20:  45%|████▍     | 702/1563 [00:29<00:34, 24.79it/s]

batch 700 loss: 0.31646292507648466


Train, Epoch 7 / 20:  46%|████▌     | 714/1563 [00:29<00:34, 24.77it/s]

batch 710 loss: 0.25649411231279373


Train, Epoch 7 / 20:  46%|████▋     | 723/1563 [00:30<00:34, 24.70it/s]

batch 720 loss: 0.25126215666532514


Train, Epoch 7 / 20:  47%|████▋     | 735/1563 [00:30<00:33, 24.72it/s]

batch 730 loss: 0.3453979939222336


Train, Epoch 7 / 20:  48%|████▊     | 744/1563 [00:31<00:33, 24.75it/s]

batch 740 loss: 0.25676259100437165


Train, Epoch 7 / 20:  48%|████▊     | 753/1563 [00:31<00:33, 24.51it/s]

batch 750 loss: 0.27038301043212415


Train, Epoch 7 / 20:  49%|████▉     | 765/1563 [00:31<00:32, 24.79it/s]

batch 760 loss: 0.34361653551459315


Train, Epoch 7 / 20:  50%|████▉     | 774/1563 [00:32<00:32, 24.65it/s]

batch 770 loss: 0.32428759187459943


Train, Epoch 7 / 20:  50%|█████     | 783/1563 [00:32<00:31, 24.52it/s]

batch 780 loss: 0.2765238516032696


Train, Epoch 7 / 20:  51%|█████     | 795/1563 [00:33<00:31, 24.66it/s]

batch 790 loss: 0.20477857291698456


Train, Epoch 7 / 20:  51%|█████▏    | 804/1563 [00:33<00:30, 24.79it/s]

batch 800 loss: 0.24279894679784775


Train, Epoch 7 / 20:  52%|█████▏    | 813/1563 [00:33<00:30, 24.72it/s]

batch 810 loss: 0.2010868787765503


Train, Epoch 7 / 20:  53%|█████▎    | 825/1563 [00:34<00:29, 24.82it/s]

batch 820 loss: 0.286409330368042


Train, Epoch 7 / 20:  53%|█████▎    | 834/1563 [00:34<00:29, 24.81it/s]

batch 830 loss: 0.27876108661293986


Train, Epoch 7 / 20:  54%|█████▍    | 843/1563 [00:35<00:29, 24.57it/s]

batch 840 loss: 0.24656605795025827


Train, Epoch 7 / 20:  55%|█████▍    | 855/1563 [00:35<00:28, 24.87it/s]

batch 850 loss: 0.3298284709453583


Train, Epoch 7 / 20:  55%|█████▌    | 864/1563 [00:35<00:28, 24.37it/s]

batch 860 loss: 0.2751151517033577


Train, Epoch 7 / 20:  56%|█████▌    | 873/1563 [00:36<00:29, 23.43it/s]

batch 870 loss: 0.2837721958756447


Train, Epoch 7 / 20:  56%|█████▋    | 882/1563 [00:36<00:31, 21.93it/s]

batch 880 loss: 0.30688548684120176


Train, Epoch 7 / 20:  57%|█████▋    | 894/1563 [00:37<00:30, 21.82it/s]

batch 890 loss: 0.3527737259864807


Train, Epoch 7 / 20:  58%|█████▊    | 903/1563 [00:37<00:29, 22.12it/s]

batch 900 loss: 0.3509263515472412


Train, Epoch 7 / 20:  58%|█████▊    | 912/1563 [00:38<00:30, 21.41it/s]

batch 910 loss: 0.2366989016532898


Train, Epoch 7 / 20:  59%|█████▉    | 924/1563 [00:38<00:27, 23.34it/s]

batch 920 loss: 0.2303773857653141


Train, Epoch 7 / 20:  60%|█████▉    | 933/1563 [00:38<00:26, 24.17it/s]

batch 930 loss: 0.29530925378203393


Train, Epoch 7 / 20:  60%|██████    | 945/1563 [00:39<00:25, 24.62it/s]

batch 940 loss: 0.2667640492320061


Train, Epoch 7 / 20:  61%|██████    | 954/1563 [00:39<00:24, 24.45it/s]

batch 950 loss: 0.29250601306557655


Train, Epoch 7 / 20:  62%|██████▏   | 963/1563 [00:40<00:24, 24.65it/s]

batch 960 loss: 0.29595649391412737


Train, Epoch 7 / 20:  62%|██████▏   | 975/1563 [00:40<00:23, 24.78it/s]

batch 970 loss: 0.2765043556690216


Train, Epoch 7 / 20:  63%|██████▎   | 984/1563 [00:41<00:23, 24.68it/s]

batch 980 loss: 0.45652770549058913


Train, Epoch 7 / 20:  64%|██████▎   | 993/1563 [00:41<00:23, 24.65it/s]

batch 990 loss: 0.2970050752162933


Train, Epoch 7 / 20:  64%|██████▍   | 1005/1563 [00:41<00:22, 24.49it/s]

batch 1000 loss: 0.33347136378288267


Train, Epoch 7 / 20:  65%|██████▍   | 1014/1563 [00:42<00:22, 24.65it/s]

batch 1010 loss: 0.2806257516145706


Train, Epoch 7 / 20:  65%|██████▌   | 1023/1563 [00:42<00:21, 24.70it/s]

batch 1020 loss: 0.2761934623122215


Train, Epoch 7 / 20:  66%|██████▌   | 1032/1563 [00:42<00:21, 24.73it/s]

batch 1030 loss: 0.296089930832386


Train, Epoch 7 / 20:  67%|██████▋   | 1044/1563 [00:43<00:20, 24.79it/s]

batch 1040 loss: 0.32471849769353867


Train, Epoch 7 / 20:  67%|██████▋   | 1053/1563 [00:43<00:20, 24.79it/s]

batch 1050 loss: 0.28662688210606574


Train, Epoch 7 / 20:  68%|██████▊   | 1065/1563 [00:44<00:20, 24.65it/s]

batch 1060 loss: 0.2827044792473316


Train, Epoch 7 / 20:  69%|██████▊   | 1074/1563 [00:44<00:19, 24.75it/s]

batch 1070 loss: 0.3178455628454685


Train, Epoch 7 / 20:  69%|██████▉   | 1083/1563 [00:45<00:19, 24.69it/s]

batch 1080 loss: 0.23633382469415665


Train, Epoch 7 / 20:  70%|███████   | 1095/1563 [00:45<00:18, 24.71it/s]

batch 1090 loss: 0.3674852207303047


Train, Epoch 7 / 20:  71%|███████   | 1104/1563 [00:45<00:18, 24.76it/s]

batch 1100 loss: 0.3070383921265602


Train, Epoch 7 / 20:  71%|███████   | 1113/1563 [00:46<00:18, 24.45it/s]

batch 1110 loss: 0.2991325527429581


Train, Epoch 7 / 20:  72%|███████▏  | 1125/1563 [00:46<00:17, 24.74it/s]

batch 1120 loss: 0.2704943247139454


Train, Epoch 7 / 20:  73%|███████▎  | 1134/1563 [00:47<00:17, 23.86it/s]

batch 1130 loss: 0.28413676619529726


Train, Epoch 7 / 20:  73%|███████▎  | 1143/1563 [00:47<00:17, 24.30it/s]

batch 1140 loss: 0.23247253149747849


Train, Epoch 7 / 20:  74%|███████▍  | 1155/1563 [00:48<00:16, 24.73it/s]

batch 1150 loss: 0.3105552069842815


Train, Epoch 7 / 20:  74%|███████▍  | 1164/1563 [00:48<00:16, 23.79it/s]

batch 1160 loss: 0.29286505207419394


Train, Epoch 7 / 20:  75%|███████▌  | 1173/1563 [00:48<00:17, 22.07it/s]

batch 1170 loss: 0.2878207892179489


Train, Epoch 7 / 20:  76%|███████▌  | 1182/1563 [00:49<00:17, 21.27it/s]

batch 1180 loss: 0.28392836898565293


Train, Epoch 7 / 20:  76%|███████▋  | 1194/1563 [00:49<00:17, 21.43it/s]

batch 1190 loss: 0.2928843282163143


Train, Epoch 7 / 20:  77%|███████▋  | 1203/1563 [00:50<00:16, 21.41it/s]

batch 1200 loss: 0.4000027760863304


Train, Epoch 7 / 20:  78%|███████▊  | 1215/1563 [00:50<00:15, 22.68it/s]

batch 1210 loss: 0.3304890044033527


Train, Epoch 7 / 20:  78%|███████▊  | 1224/1563 [00:51<00:14, 23.95it/s]

batch 1220 loss: 0.304067013412714


Train, Epoch 7 / 20:  79%|███████▉  | 1233/1563 [00:51<00:13, 24.21it/s]

batch 1230 loss: 0.30746866166591647


Train, Epoch 7 / 20:  80%|███████▉  | 1245/1563 [00:51<00:12, 24.79it/s]

batch 1240 loss: 0.2900254651904106


Train, Epoch 7 / 20:  80%|████████  | 1254/1563 [00:52<00:12, 24.64it/s]

batch 1250 loss: 0.2638028673827648


Train, Epoch 7 / 20:  81%|████████  | 1263/1563 [00:52<00:12, 24.84it/s]

batch 1260 loss: 0.26901416331529615


Train, Epoch 7 / 20:  82%|████████▏ | 1275/1563 [00:53<00:11, 24.70it/s]

batch 1270 loss: 0.2349069558084011


Train, Epoch 7 / 20:  82%|████████▏ | 1284/1563 [00:53<00:11, 24.64it/s]

batch 1280 loss: 0.26080356240272523


Train, Epoch 7 / 20:  83%|████████▎ | 1293/1563 [00:53<00:10, 24.66it/s]

batch 1290 loss: 0.2926682695746422


Train, Epoch 7 / 20:  83%|████████▎ | 1305/1563 [00:54<00:10, 24.69it/s]

batch 1300 loss: 0.29139201939105985


Train, Epoch 7 / 20:  84%|████████▍ | 1314/1563 [00:54<00:10, 24.50it/s]

batch 1310 loss: 0.24769603163003923


Train, Epoch 7 / 20:  85%|████████▍ | 1323/1563 [00:55<00:09, 24.59it/s]

batch 1320 loss: 0.3087778702378273


Train, Epoch 7 / 20:  85%|████████▌ | 1335/1563 [00:55<00:09, 24.34it/s]

batch 1330 loss: 0.24707295894622802


Train, Epoch 7 / 20:  86%|████████▌ | 1344/1563 [00:56<00:08, 24.62it/s]

batch 1340 loss: 0.35396951287984846


Train, Epoch 7 / 20:  87%|████████▋ | 1353/1563 [00:56<00:08, 23.93it/s]

batch 1350 loss: 0.41144015192985534


Train, Epoch 7 / 20:  87%|████████▋ | 1365/1563 [00:56<00:08, 24.61it/s]

batch 1360 loss: 0.3032369360327721


Train, Epoch 7 / 20:  88%|████████▊ | 1374/1563 [00:57<00:07, 24.76it/s]

batch 1370 loss: 0.24177702963352204


Train, Epoch 7 / 20:  88%|████████▊ | 1383/1563 [00:57<00:07, 24.23it/s]

batch 1380 loss: 0.31579920575022696


Train, Epoch 7 / 20:  89%|████████▉ | 1395/1563 [00:58<00:06, 24.53it/s]

batch 1390 loss: 0.3214674010872841


Train, Epoch 7 / 20:  90%|████████▉ | 1404/1563 [00:58<00:06, 24.48it/s]

batch 1400 loss: 0.2924426689743996


Train, Epoch 7 / 20:  90%|█████████ | 1413/1563 [00:58<00:06, 24.70it/s]

batch 1410 loss: 0.3655315116047859


Train, Epoch 7 / 20:  91%|█████████ | 1425/1563 [00:59<00:05, 24.67it/s]

batch 1420 loss: 0.3319954439997673


Train, Epoch 7 / 20:  92%|█████████▏| 1434/1563 [00:59<00:05, 24.71it/s]

batch 1430 loss: 0.31423617005348203


Train, Epoch 7 / 20:  92%|█████████▏| 1443/1563 [01:00<00:04, 24.92it/s]

batch 1440 loss: 0.3121711567044258


Train, Epoch 7 / 20:  93%|█████████▎| 1452/1563 [01:00<00:04, 24.82it/s]

batch 1450 loss: 0.25913549289107324


Train, Epoch 7 / 20:  94%|█████████▎| 1464/1563 [01:00<00:04, 22.99it/s]

batch 1460 loss: 0.31673028618097304


Train, Epoch 7 / 20:  94%|█████████▍| 1473/1563 [01:01<00:04, 22.12it/s]

batch 1470 loss: 0.24625298231840134


Train, Epoch 7 / 20:  95%|█████████▍| 1482/1563 [01:01<00:03, 22.20it/s]

batch 1480 loss: 0.26133751720190046


Train, Epoch 7 / 20:  96%|█████████▌| 1494/1563 [01:02<00:03, 22.44it/s]

batch 1490 loss: 0.32195444256067274


Train, Epoch 7 / 20:  96%|█████████▌| 1503/1563 [01:02<00:02, 20.83it/s]

batch 1500 loss: 0.33642530292272566


Train, Epoch 7 / 20:  97%|█████████▋| 1515/1563 [01:03<00:02, 22.93it/s]

batch 1510 loss: 0.2399849072098732


Train, Epoch 7 / 20:  98%|█████████▊| 1524/1563 [01:03<00:01, 23.97it/s]

batch 1520 loss: 0.19687778651714324


Train, Epoch 7 / 20:  98%|█████████▊| 1533/1563 [01:04<00:01, 24.47it/s]

batch 1530 loss: 0.2064802635461092


Train, Epoch 7 / 20:  99%|█████████▉| 1545/1563 [01:04<00:00, 24.83it/s]

batch 1540 loss: 0.279713749140501


Train, Epoch 7 / 20:  99%|█████████▉| 1554/1563 [01:04<00:00, 24.79it/s]

batch 1550 loss: 0.2788370862603188


Train, Epoch 7 / 20: 100%|██████████| 1563/1563 [01:05<00:00, 23.97it/s]


batch 1560 loss: 0.3233940973877907


Test, Epoch 7 / 20: 100%|██████████| 1563/1563 [00:29<00:00, 52.61it/s]


Epoch 7, loss: 0.47272218555301426, accuracy: 0.79976


Train, Epoch 8 / 20:   1%|          | 12/1563 [00:00<01:03, 24.51it/s]

batch 10 loss: 0.25516330190002917


Train, Epoch 8 / 20:   2%|▏         | 24/1563 [00:00<01:02, 24.59it/s]

batch 20 loss: 0.27033828794956205


Train, Epoch 8 / 20:   2%|▏         | 33/1563 [00:01<01:01, 24.77it/s]

batch 30 loss: 0.2538119211792946


Train, Epoch 8 / 20:   3%|▎         | 45/1563 [00:01<01:01, 24.70it/s]

batch 40 loss: 0.3092827916145325


Train, Epoch 8 / 20:   3%|▎         | 54/1563 [00:02<01:00, 24.76it/s]

batch 50 loss: 0.3023091807961464


Train, Epoch 8 / 20:   4%|▍         | 63/1563 [00:02<01:02, 24.08it/s]

batch 60 loss: 0.25344905331730844


Train, Epoch 8 / 20:   5%|▍         | 72/1563 [00:02<01:05, 22.81it/s]

batch 70 loss: 0.3419714167714119


Train, Epoch 8 / 20:   5%|▌         | 84/1563 [00:03<01:07, 21.98it/s]

batch 80 loss: 0.38235448151826856


Train, Epoch 8 / 20:   6%|▌         | 93/1563 [00:03<01:05, 22.50it/s]

batch 90 loss: 0.25503516644239427


Train, Epoch 8 / 20:   7%|▋         | 102/1563 [00:04<01:07, 21.52it/s]

batch 100 loss: 0.2992377296090126


Train, Epoch 8 / 20:   7%|▋         | 114/1563 [00:04<01:05, 22.20it/s]

batch 110 loss: 0.24519265368580817


Train, Epoch 8 / 20:   8%|▊         | 123/1563 [00:05<01:00, 23.94it/s]

batch 120 loss: 0.25348447188735007


Train, Epoch 8 / 20:   8%|▊         | 132/1563 [00:05<00:58, 24.30it/s]

batch 130 loss: 0.23586399406194686


Train, Epoch 8 / 20:   9%|▉         | 144/1563 [00:06<00:57, 24.59it/s]

batch 140 loss: 0.2851927302777767


Train, Epoch 8 / 20:  10%|▉         | 153/1563 [00:06<00:56, 24.74it/s]

batch 150 loss: 0.24117894768714904


Train, Epoch 8 / 20:  11%|█         | 165/1563 [00:06<00:56, 24.88it/s]

batch 160 loss: 0.22350343838334083


Train, Epoch 8 / 20:  11%|█         | 174/1563 [00:07<00:55, 24.88it/s]

batch 170 loss: 0.38614930063486097


Train, Epoch 8 / 20:  12%|█▏        | 183/1563 [00:07<00:55, 24.74it/s]

batch 180 loss: 0.4106637969613075


Train, Epoch 8 / 20:  12%|█▏        | 195/1563 [00:08<00:55, 24.80it/s]

batch 190 loss: 0.3528593711555004


Train, Epoch 8 / 20:  13%|█▎        | 204/1563 [00:08<00:54, 24.79it/s]

batch 200 loss: 0.29560487642884253


Train, Epoch 8 / 20:  14%|█▎        | 213/1563 [00:08<00:54, 24.62it/s]

batch 210 loss: 0.2896487519145012


Train, Epoch 8 / 20:  14%|█▍        | 225/1563 [00:09<00:53, 24.88it/s]

batch 220 loss: 0.2652461126446724


Train, Epoch 8 / 20:  15%|█▍        | 234/1563 [00:09<00:54, 24.35it/s]

batch 230 loss: 0.33393368124961853


Train, Epoch 8 / 20:  16%|█▌        | 243/1563 [00:10<00:53, 24.69it/s]

batch 240 loss: 0.22345369383692743


Train, Epoch 8 / 20:  16%|█▋        | 255/1563 [00:10<00:52, 24.86it/s]

batch 250 loss: 0.24454424232244493


Train, Epoch 8 / 20:  17%|█▋        | 264/1563 [00:10<00:53, 24.51it/s]

batch 260 loss: 0.28898887038230897


Train, Epoch 8 / 20:  17%|█▋        | 273/1563 [00:11<00:52, 24.56it/s]

batch 270 loss: 0.20319241806864738


Train, Epoch 8 / 20:  18%|█▊        | 282/1563 [00:11<00:52, 24.45it/s]

batch 280 loss: 0.21389754712581635


Train, Epoch 8 / 20:  19%|█▉        | 294/1563 [00:12<00:51, 24.63it/s]

batch 290 loss: 0.23527833372354506


Train, Epoch 8 / 20:  19%|█▉        | 303/1563 [00:12<00:50, 24.74it/s]

batch 300 loss: 0.28346753790974616


Train, Epoch 8 / 20:  20%|██        | 315/1563 [00:13<00:50, 24.83it/s]

batch 310 loss: 0.2959387909621


Train, Epoch 8 / 20:  21%|██        | 324/1563 [00:13<00:50, 24.58it/s]

batch 320 loss: 0.34895150288939475


Train, Epoch 8 / 20:  21%|██▏       | 333/1563 [00:13<00:49, 24.65it/s]

batch 330 loss: 0.2979867160320282


Train, Epoch 8 / 20:  22%|██▏       | 345/1563 [00:14<00:48, 24.89it/s]

batch 340 loss: 0.28066085278987885


Train, Epoch 8 / 20:  23%|██▎       | 354/1563 [00:14<00:48, 24.79it/s]

batch 350 loss: 0.24434819296002389


Train, Epoch 8 / 20:  23%|██▎       | 363/1563 [00:15<00:51, 23.29it/s]

batch 360 loss: 0.36763903945684434


Train, Epoch 8 / 20:  24%|██▍       | 372/1563 [00:15<00:56, 21.16it/s]

batch 370 loss: 0.273877902328968


Train, Epoch 8 / 20:  25%|██▍       | 384/1563 [00:16<00:54, 21.63it/s]

batch 380 loss: 0.20798442140221596


Train, Epoch 8 / 20:  25%|██▌       | 393/1563 [00:16<00:52, 22.46it/s]

batch 390 loss: 0.2414885487407446


Train, Epoch 8 / 20:  26%|██▌       | 402/1563 [00:16<00:54, 21.23it/s]

batch 400 loss: 0.31751308441162107


Train, Epoch 8 / 20:  26%|██▋       | 414/1563 [00:17<00:51, 22.44it/s]

batch 410 loss: 0.32414280660450456


Train, Epoch 8 / 20:  27%|██▋       | 423/1563 [00:17<00:47, 23.77it/s]

batch 420 loss: 0.33927521109580994


Train, Epoch 8 / 20:  28%|██▊       | 435/1563 [00:18<00:45, 24.60it/s]

batch 430 loss: 0.2621593102812767


Train, Epoch 8 / 20:  28%|██▊       | 444/1563 [00:18<00:45, 24.78it/s]

batch 440 loss: 0.3117008492350578


Train, Epoch 8 / 20:  29%|██▉       | 453/1563 [00:18<00:45, 24.63it/s]

batch 450 loss: 0.2554462157189846


Train, Epoch 8 / 20:  30%|██▉       | 465/1563 [00:19<00:44, 24.76it/s]

batch 460 loss: 0.31876165270805357


Train, Epoch 8 / 20:  30%|███       | 474/1563 [00:19<00:44, 24.61it/s]

batch 470 loss: 0.2585930600762367


Train, Epoch 8 / 20:  31%|███       | 483/1563 [00:20<00:44, 24.46it/s]

batch 480 loss: 0.3283442385494709


Train, Epoch 8 / 20:  32%|███▏      | 495/1563 [00:20<00:42, 24.98it/s]

batch 490 loss: 0.2771469585597515


Train, Epoch 8 / 20:  32%|███▏      | 504/1563 [00:21<00:42, 24.90it/s]

batch 500 loss: 0.291900584846735


Train, Epoch 8 / 20:  33%|███▎      | 513/1563 [00:21<00:42, 24.72it/s]

batch 510 loss: 0.2755433090031147


Train, Epoch 8 / 20:  34%|███▎      | 525/1563 [00:21<00:42, 24.69it/s]

batch 520 loss: 0.3285327725112438


Train, Epoch 8 / 20:  34%|███▍      | 534/1563 [00:22<00:41, 24.54it/s]

batch 530 loss: 0.34624488204717635


Train, Epoch 8 / 20:  35%|███▍      | 543/1563 [00:22<00:40, 24.93it/s]

batch 540 loss: 0.28902303874492646


Train, Epoch 8 / 20:  36%|███▌      | 555/1563 [00:23<00:40, 24.87it/s]

batch 550 loss: 0.2701640471816063


Train, Epoch 8 / 20:  36%|███▌      | 564/1563 [00:23<00:40, 24.63it/s]

batch 560 loss: 0.2443135693669319


Train, Epoch 8 / 20:  37%|███▋      | 573/1563 [00:23<00:40, 24.69it/s]

batch 570 loss: 0.32523070871829984


Train, Epoch 8 / 20:  37%|███▋      | 585/1563 [00:24<00:39, 24.67it/s]

batch 580 loss: 0.35020338371396065


Train, Epoch 8 / 20:  38%|███▊      | 594/1563 [00:24<00:38, 25.01it/s]

batch 590 loss: 0.2575888931751251


Train, Epoch 8 / 20:  39%|███▊      | 603/1563 [00:25<00:38, 24.82it/s]

batch 600 loss: 0.2973852515220642


Train, Epoch 8 / 20:  39%|███▉      | 615/1563 [00:25<00:38, 24.68it/s]

batch 610 loss: 0.24574581608176232


Train, Epoch 8 / 20:  40%|███▉      | 624/1563 [00:25<00:37, 24.81it/s]

batch 620 loss: 0.29106305688619616


Train, Epoch 8 / 20:  40%|████      | 633/1563 [00:26<00:38, 24.40it/s]

batch 630 loss: 0.32407009303569795


Train, Epoch 8 / 20:  41%|████▏     | 645/1563 [00:26<00:37, 24.67it/s]

batch 640 loss: 0.35653412863612177


Train, Epoch 8 / 20:  42%|████▏     | 654/1563 [00:27<00:36, 24.86it/s]

batch 650 loss: 0.32528193220496177


Train, Epoch 8 / 20:  42%|████▏     | 663/1563 [00:27<00:39, 22.83it/s]

batch 660 loss: 0.31375954300165176


Train, Epoch 8 / 20:  43%|████▎     | 672/1563 [00:27<00:41, 21.60it/s]

batch 670 loss: 0.21068716272711754


Train, Epoch 8 / 20:  44%|████▍     | 684/1563 [00:28<00:41, 21.14it/s]

batch 680 loss: 0.2669059485197067


Train, Epoch 8 / 20:  44%|████▍     | 693/1563 [00:28<00:40, 21.52it/s]

batch 690 loss: 0.2946947455406189


Train, Epoch 8 / 20:  45%|████▍     | 702/1563 [00:29<00:41, 20.84it/s]

batch 700 loss: 0.2769915580749512


Train, Epoch 8 / 20:  46%|████▌     | 714/1563 [00:29<00:36, 23.40it/s]

batch 710 loss: 0.26822284460067747


Train, Epoch 8 / 20:  46%|████▋     | 723/1563 [00:30<00:34, 24.20it/s]

batch 720 loss: 0.3109105974435806


Train, Epoch 8 / 20:  47%|████▋     | 732/1563 [00:30<00:33, 24.60it/s]

batch 730 loss: 0.29194975420832636


Train, Epoch 8 / 20:  48%|████▊     | 744/1563 [00:31<00:33, 24.27it/s]

batch 740 loss: 0.35333543345332147


Train, Epoch 8 / 20:  48%|████▊     | 753/1563 [00:31<00:33, 24.08it/s]

batch 750 loss: 0.27481700778007506


Train, Epoch 8 / 20:  49%|████▉     | 765/1563 [00:31<00:32, 24.62it/s]

batch 760 loss: 0.3528481900691986


Train, Epoch 8 / 20:  50%|████▉     | 774/1563 [00:32<00:31, 24.70it/s]

batch 770 loss: 0.2875736117362976


Train, Epoch 8 / 20:  50%|█████     | 783/1563 [00:32<00:31, 24.70it/s]

batch 780 loss: 0.3358935132622719


Train, Epoch 8 / 20:  51%|█████     | 795/1563 [00:33<00:31, 24.69it/s]

batch 790 loss: 0.22242561504244804


Train, Epoch 8 / 20:  51%|█████▏    | 804/1563 [00:33<00:30, 24.73it/s]

batch 800 loss: 0.2565090924501419


Train, Epoch 8 / 20:  52%|█████▏    | 813/1563 [00:33<00:30, 24.77it/s]

batch 810 loss: 0.24782049059867858


Train, Epoch 8 / 20:  53%|█████▎    | 825/1563 [00:34<00:29, 24.71it/s]

batch 820 loss: 0.23112034946680068


Train, Epoch 8 / 20:  53%|█████▎    | 834/1563 [00:34<00:29, 24.75it/s]

batch 830 loss: 0.30968105494976045


Train, Epoch 8 / 20:  54%|█████▍    | 843/1563 [00:35<00:29, 24.41it/s]

batch 840 loss: 0.21459222361445426


Train, Epoch 8 / 20:  55%|█████▍    | 852/1563 [00:35<00:28, 24.56it/s]

batch 850 loss: 0.4038507789373398


Train, Epoch 8 / 20:  55%|█████▌    | 864/1563 [00:35<00:27, 24.98it/s]

batch 860 loss: 0.31697801426053046


Train, Epoch 8 / 20:  56%|█████▌    | 873/1563 [00:36<00:27, 24.85it/s]

batch 870 loss: 0.3427204839885235


Train, Epoch 8 / 20:  57%|█████▋    | 885/1563 [00:36<00:27, 24.69it/s]

batch 880 loss: 0.25641448348760604


Train, Epoch 8 / 20:  57%|█████▋    | 894/1563 [00:37<00:26, 24.97it/s]

batch 890 loss: 0.2619729794561863


Train, Epoch 8 / 20:  58%|█████▊    | 903/1563 [00:37<00:26, 24.73it/s]

batch 900 loss: 0.2103221245110035


Train, Epoch 8 / 20:  59%|█████▊    | 915/1563 [00:38<00:26, 24.81it/s]

batch 910 loss: 0.29012060910463333


Train, Epoch 8 / 20:  59%|█████▉    | 924/1563 [00:38<00:25, 24.81it/s]

batch 920 loss: 0.3057502120733261


Train, Epoch 8 / 20:  60%|█████▉    | 933/1563 [00:38<00:25, 24.68it/s]

batch 930 loss: 0.26520983725786207


Train, Epoch 8 / 20:  60%|██████    | 945/1563 [00:39<00:24, 24.82it/s]

batch 940 loss: 0.25846299678087237


Train, Epoch 8 / 20:  61%|██████    | 954/1563 [00:39<00:25, 24.10it/s]

batch 950 loss: 0.2459219887852669


Train, Epoch 8 / 20:  62%|██████▏   | 963/1563 [00:40<00:26, 22.34it/s]

batch 960 loss: 0.2713544607162476


Train, Epoch 8 / 20:  62%|██████▏   | 972/1563 [00:40<00:26, 22.67it/s]

batch 970 loss: 0.3171381726861


Train, Epoch 8 / 20:  63%|██████▎   | 984/1563 [00:41<00:26, 21.54it/s]

batch 980 loss: 0.29230527132749556


Train, Epoch 8 / 20:  64%|██████▎   | 993/1563 [00:41<00:27, 21.01it/s]

batch 990 loss: 0.265078192949295


Train, Epoch 8 / 20:  64%|██████▍   | 1002/1563 [00:41<00:27, 20.66it/s]

batch 1000 loss: 0.24948305413126945


Train, Epoch 8 / 20:  65%|██████▍   | 1014/1563 [00:42<00:23, 23.10it/s]

batch 1010 loss: 0.30768261253833773


Train, Epoch 8 / 20:  65%|██████▌   | 1023/1563 [00:42<00:22, 23.75it/s]

batch 1020 loss: 0.32990323901176455


Train, Epoch 8 / 20:  66%|██████▌   | 1035/1563 [00:43<00:21, 24.52it/s]

batch 1030 loss: 0.3035085812211037


Train, Epoch 8 / 20:  67%|██████▋   | 1044/1563 [00:43<00:21, 24.27it/s]

batch 1040 loss: 0.26297579184174535


Train, Epoch 8 / 20:  67%|██████▋   | 1053/1563 [00:44<00:20, 24.62it/s]

batch 1050 loss: 0.33247318416833876


Train, Epoch 8 / 20:  68%|██████▊   | 1065/1563 [00:44<00:20, 24.74it/s]

batch 1060 loss: 0.37112949788570404


Train, Epoch 8 / 20:  69%|██████▊   | 1074/1563 [00:44<00:19, 24.55it/s]

batch 1070 loss: 0.2596641771495342


Train, Epoch 8 / 20:  69%|██████▉   | 1083/1563 [00:45<00:19, 24.80it/s]

batch 1080 loss: 0.25021385699510573


Train, Epoch 8 / 20:  70%|██████▉   | 1092/1563 [00:45<00:19, 24.51it/s]

batch 1090 loss: 0.2900627590715885


Train, Epoch 8 / 20:  71%|███████   | 1104/1563 [00:46<00:19, 23.94it/s]

batch 1100 loss: 0.308623593300581


Train, Epoch 8 / 20:  71%|███████   | 1113/1563 [00:46<00:18, 24.49it/s]

batch 1110 loss: 0.24165388941764832


Train, Epoch 8 / 20:  72%|███████▏  | 1125/1563 [00:46<00:17, 24.81it/s]

batch 1120 loss: 0.28326725512742995


Train, Epoch 8 / 20:  73%|███████▎  | 1134/1563 [00:47<00:17, 24.78it/s]

batch 1130 loss: 0.3359262928366661


Train, Epoch 8 / 20:  73%|███████▎  | 1143/1563 [00:47<00:16, 24.76it/s]

batch 1140 loss: 0.29147293865680696


Train, Epoch 8 / 20:  74%|███████▍  | 1155/1563 [00:48<00:16, 24.81it/s]

batch 1150 loss: 0.30830543488264084


Train, Epoch 8 / 20:  74%|███████▍  | 1164/1563 [00:48<00:16, 24.69it/s]

batch 1160 loss: 0.33082434311509135


Train, Epoch 8 / 20:  75%|███████▌  | 1173/1563 [00:48<00:15, 24.53it/s]

batch 1170 loss: 0.3279191359877586


Train, Epoch 8 / 20:  76%|███████▌  | 1185/1563 [00:49<00:15, 24.81it/s]

batch 1180 loss: 0.2766828551888466


Train, Epoch 8 / 20:  76%|███████▋  | 1194/1563 [00:49<00:14, 24.76it/s]

batch 1190 loss: 0.336844664812088


Train, Epoch 8 / 20:  77%|███████▋  | 1203/1563 [00:50<00:14, 24.57it/s]

batch 1200 loss: 0.3417163461446762


Train, Epoch 8 / 20:  78%|███████▊  | 1215/1563 [00:50<00:14, 24.79it/s]

batch 1210 loss: 0.2745563112199306


Train, Epoch 8 / 20:  78%|███████▊  | 1224/1563 [00:50<00:13, 24.32it/s]

batch 1220 loss: 0.2816253200173378


Train, Epoch 8 / 20:  79%|███████▉  | 1233/1563 [00:51<00:13, 24.72it/s]

batch 1230 loss: 0.2365153029561043


Train, Epoch 8 / 20:  80%|███████▉  | 1245/1563 [00:51<00:12, 24.89it/s]

batch 1240 loss: 0.2890980660915375


Train, Epoch 8 / 20:  80%|████████  | 1254/1563 [00:52<00:12, 24.09it/s]

batch 1250 loss: 0.32122038304805756


Train, Epoch 8 / 20:  81%|████████  | 1263/1563 [00:52<00:13, 22.90it/s]

batch 1260 loss: 0.3012789815664291


Train, Epoch 8 / 20:  81%|████████▏ | 1272/1563 [00:52<00:12, 22.95it/s]

batch 1270 loss: 0.25038682445883753


Train, Epoch 8 / 20:  82%|████████▏ | 1284/1563 [00:53<00:12, 22.84it/s]

batch 1280 loss: 0.23893932178616523


Train, Epoch 8 / 20:  83%|████████▎ | 1293/1563 [00:53<00:12, 22.35it/s]

batch 1290 loss: 0.32962636947631835


Train, Epoch 8 / 20:  83%|████████▎ | 1302/1563 [00:54<00:12, 21.05it/s]

batch 1300 loss: 0.3161132723093033


Train, Epoch 8 / 20:  84%|████████▍ | 1314/1563 [00:54<00:10, 23.73it/s]

batch 1310 loss: 0.31049401611089705


Train, Epoch 8 / 20:  85%|████████▍ | 1323/1563 [00:55<00:09, 24.54it/s]

batch 1320 loss: 0.24936536997556685


Train, Epoch 8 / 20:  85%|████████▌ | 1335/1563 [00:55<00:09, 24.79it/s]

batch 1330 loss: 0.2762365497648716


Train, Epoch 8 / 20:  86%|████████▌ | 1344/1563 [00:56<00:08, 24.84it/s]

batch 1340 loss: 0.2986251175403595


Train, Epoch 8 / 20:  87%|████████▋ | 1353/1563 [00:56<00:08, 24.64it/s]

batch 1350 loss: 0.21468350440263748


Train, Epoch 8 / 20:  87%|████████▋ | 1365/1563 [00:56<00:08, 24.73it/s]

batch 1360 loss: 0.30968987569212914


Train, Epoch 8 / 20:  88%|████████▊ | 1374/1563 [00:57<00:07, 24.57it/s]

batch 1370 loss: 0.32158771976828576


Train, Epoch 8 / 20:  88%|████████▊ | 1383/1563 [00:57<00:07, 24.49it/s]

batch 1380 loss: 0.2425489455461502


Train, Epoch 8 / 20:  89%|████████▉ | 1395/1563 [00:58<00:06, 24.92it/s]

batch 1390 loss: 0.205736718326807


Train, Epoch 8 / 20:  90%|████████▉ | 1404/1563 [00:58<00:06, 24.70it/s]

batch 1400 loss: 0.23170458897948265


Train, Epoch 8 / 20:  90%|█████████ | 1413/1563 [00:58<00:06, 24.76it/s]

batch 1410 loss: 0.2430270716547966


Train, Epoch 8 / 20:  91%|█████████ | 1425/1563 [00:59<00:05, 24.76it/s]

batch 1420 loss: 0.3414906054735184


Train, Epoch 8 / 20:  92%|█████████▏| 1434/1563 [00:59<00:05, 24.88it/s]

batch 1430 loss: 0.331860314309597


Train, Epoch 8 / 20:  92%|█████████▏| 1443/1563 [01:00<00:04, 24.84it/s]

batch 1440 loss: 0.2756267085671425


Train, Epoch 8 / 20:  93%|█████████▎| 1455/1563 [01:00<00:04, 24.75it/s]

batch 1450 loss: 0.2916610687971115


Train, Epoch 8 / 20:  94%|█████████▎| 1464/1563 [01:00<00:04, 24.62it/s]

batch 1460 loss: 0.2980972856283188


Train, Epoch 8 / 20:  94%|█████████▍| 1473/1563 [01:01<00:03, 24.46it/s]

batch 1470 loss: 0.3097869664430618


Train, Epoch 8 / 20:  95%|█████████▌| 1485/1563 [01:01<00:03, 24.66it/s]

batch 1480 loss: 0.3154947556555271


Train, Epoch 8 / 20:  96%|█████████▌| 1494/1563 [01:02<00:02, 24.76it/s]

batch 1490 loss: 0.28327969163656236


Train, Epoch 8 / 20:  96%|█████████▌| 1503/1563 [01:02<00:02, 24.68it/s]

batch 1500 loss: 0.35854734629392626


Train, Epoch 8 / 20:  97%|█████████▋| 1515/1563 [01:03<00:01, 24.78it/s]

batch 1510 loss: 0.24493960440158843


Train, Epoch 8 / 20:  98%|█████████▊| 1524/1563 [01:03<00:01, 24.57it/s]

batch 1520 loss: 0.32934921979904175


Train, Epoch 8 / 20:  98%|█████████▊| 1533/1563 [01:03<00:01, 24.81it/s]

batch 1530 loss: 0.30257818698883054


Train, Epoch 8 / 20:  99%|█████████▉| 1545/1563 [01:04<00:00, 24.76it/s]

batch 1540 loss: 0.31557936891913413


Train, Epoch 8 / 20:  99%|█████████▉| 1554/1563 [01:04<00:00, 23.55it/s]

batch 1550 loss: 0.2946524456143379


Train, Epoch 8 / 20: 100%|██████████| 1563/1563 [01:05<00:00, 24.03it/s]


batch 1560 loss: 0.3181171163916588


Test, Epoch 8 / 20: 100%|██████████| 1563/1563 [00:30<00:00, 51.72it/s]


Epoch 8, loss: 0.45639348453968764, accuracy: 0.80512


Train, Epoch 9 / 20:   1%|          | 15/1563 [00:00<01:02, 24.64it/s]

batch 10 loss: 0.22237298265099525


Train, Epoch 9 / 20:   2%|▏         | 24/1563 [00:00<01:02, 24.57it/s]

batch 20 loss: 0.29061131179332733


Train, Epoch 9 / 20:   2%|▏         | 33/1563 [00:01<01:01, 24.71it/s]

batch 30 loss: 0.28773987144231794


Train, Epoch 9 / 20:   3%|▎         | 45/1563 [00:01<01:01, 24.83it/s]

batch 40 loss: 0.2364288866519928


Train, Epoch 9 / 20:   3%|▎         | 54/1563 [00:02<01:01, 24.71it/s]

batch 50 loss: 0.27774215266108515


Train, Epoch 9 / 20:   4%|▍         | 63/1563 [00:02<01:00, 24.69it/s]

batch 60 loss: 0.317693492770195


Train, Epoch 9 / 20:   5%|▍         | 72/1563 [00:02<01:00, 24.68it/s]

batch 70 loss: 0.329186936467886


Train, Epoch 9 / 20:   5%|▌         | 84/1563 [00:03<00:59, 24.76it/s]

batch 80 loss: 0.2878958165645599


Train, Epoch 9 / 20:   6%|▌         | 93/1563 [00:03<00:59, 24.70it/s]

batch 90 loss: 0.2187580019235611


Train, Epoch 9 / 20:   7%|▋         | 105/1563 [00:04<00:59, 24.55it/s]

batch 100 loss: 0.31351585537195203


Train, Epoch 9 / 20:   7%|▋         | 114/1563 [00:04<00:58, 24.82it/s]

batch 110 loss: 0.21544888392090797


Train, Epoch 9 / 20:   8%|▊         | 123/1563 [00:04<00:58, 24.67it/s]

batch 120 loss: 0.2734500139951706


Train, Epoch 9 / 20:   9%|▊         | 135/1563 [00:05<00:57, 24.85it/s]

batch 130 loss: 0.22542952038347722


Train, Epoch 9 / 20:   9%|▉         | 144/1563 [00:05<00:57, 24.89it/s]

batch 140 loss: 0.2600521631538868


Train, Epoch 9 / 20:  10%|▉         | 153/1563 [00:06<00:56, 24.87it/s]

batch 150 loss: 0.21391021758317946


Train, Epoch 9 / 20:  10%|█         | 162/1563 [00:06<01:00, 23.06it/s]

batch 160 loss: 0.31970455050468444


Train, Epoch 9 / 20:  11%|█         | 174/1563 [00:07<01:02, 22.20it/s]

batch 170 loss: 0.2975771889090538


Train, Epoch 9 / 20:  12%|█▏        | 183/1563 [00:07<01:01, 22.40it/s]

batch 180 loss: 0.27468984946608543


Train, Epoch 9 / 20:  12%|█▏        | 192/1563 [00:07<01:01, 22.12it/s]

batch 190 loss: 0.2832423664629459


Train, Epoch 9 / 20:  13%|█▎        | 204/1563 [00:08<01:04, 21.02it/s]

batch 200 loss: 0.27897958382964133


Train, Epoch 9 / 20:  14%|█▎        | 213/1563 [00:08<00:58, 23.15it/s]

batch 210 loss: 0.28437661603093145


Train, Epoch 9 / 20:  14%|█▍        | 225/1563 [00:09<00:54, 24.37it/s]

batch 220 loss: 0.22778441831469537


Train, Epoch 9 / 20:  15%|█▍        | 234/1563 [00:09<00:53, 24.65it/s]

batch 230 loss: 0.31042017191648485


Train, Epoch 9 / 20:  16%|█▌        | 243/1563 [00:10<00:53, 24.70it/s]

batch 240 loss: 0.3383623890578747


Train, Epoch 9 / 20:  16%|█▋        | 255/1563 [00:10<00:52, 24.95it/s]

batch 250 loss: 0.22284768521785736


Train, Epoch 9 / 20:  17%|█▋        | 264/1563 [00:11<00:53, 24.27it/s]

batch 260 loss: 0.24269395172595978


Train, Epoch 9 / 20:  17%|█▋        | 273/1563 [00:11<00:53, 24.25it/s]

batch 270 loss: 0.25274073854088785


Train, Epoch 9 / 20:  18%|█▊        | 285/1563 [00:11<00:51, 24.64it/s]

batch 280 loss: 0.21882502883672714


Train, Epoch 9 / 20:  19%|█▉        | 294/1563 [00:12<00:51, 24.72it/s]

batch 290 loss: 0.29993632808327675


Train, Epoch 9 / 20:  19%|█▉        | 303/1563 [00:12<00:51, 24.62it/s]

batch 300 loss: 0.2552404090762138


Train, Epoch 9 / 20:  20%|██        | 315/1563 [00:13<00:50, 24.77it/s]

batch 310 loss: 0.3383495852351189


Train, Epoch 9 / 20:  21%|██        | 324/1563 [00:13<00:50, 24.63it/s]

batch 320 loss: 0.2630695790052414


Train, Epoch 9 / 20:  21%|██▏       | 333/1563 [00:13<00:49, 24.86it/s]

batch 330 loss: 0.21121898107230663


Train, Epoch 9 / 20:  22%|██▏       | 345/1563 [00:14<00:49, 24.65it/s]

batch 340 loss: 0.21684909239411354


Train, Epoch 9 / 20:  23%|██▎       | 354/1563 [00:14<00:49, 24.35it/s]

batch 350 loss: 0.21515389680862426


Train, Epoch 9 / 20:  23%|██▎       | 363/1563 [00:15<00:49, 24.35it/s]

batch 360 loss: 0.24624540731310846


Train, Epoch 9 / 20:  24%|██▍       | 375/1563 [00:15<00:48, 24.67it/s]

batch 370 loss: 0.38705344349145887


Train, Epoch 9 / 20:  25%|██▍       | 384/1563 [00:15<00:47, 24.85it/s]

batch 380 loss: 0.2807975634932518


Train, Epoch 9 / 20:  25%|██▌       | 393/1563 [00:16<00:47, 24.75it/s]

batch 390 loss: 0.26389207169413564


Train, Epoch 9 / 20:  26%|██▌       | 402/1563 [00:16<00:46, 24.80it/s]

batch 400 loss: 0.3297661833465099


Train, Epoch 9 / 20:  26%|██▋       | 414/1563 [00:17<00:46, 24.61it/s]

batch 410 loss: 0.3341529980301857


Train, Epoch 9 / 20:  27%|██▋       | 423/1563 [00:17<00:46, 24.59it/s]

batch 420 loss: 0.3040073364973068


Train, Epoch 9 / 20:  28%|██▊       | 435/1563 [00:17<00:45, 24.71it/s]

batch 430 loss: 0.27972463592886926


Train, Epoch 9 / 20:  28%|██▊       | 444/1563 [00:18<00:44, 24.87it/s]

batch 440 loss: 0.2930921345949173


Train, Epoch 9 / 20:  29%|██▉       | 453/1563 [00:18<00:46, 24.06it/s]

batch 450 loss: 0.25490559712052346


Train, Epoch 9 / 20:  30%|██▉       | 462/1563 [00:19<00:48, 22.50it/s]

batch 460 loss: 0.2766289435327053


Train, Epoch 9 / 20:  30%|███       | 474/1563 [00:19<00:51, 21.19it/s]

batch 470 loss: 0.25028085261583327


Train, Epoch 9 / 20:  31%|███       | 483/1563 [00:20<00:49, 21.88it/s]

batch 480 loss: 0.22320767492055893


Train, Epoch 9 / 20:  31%|███▏      | 492/1563 [00:20<00:49, 21.71it/s]

batch 490 loss: 0.26101064682006836


Train, Epoch 9 / 20:  32%|███▏      | 504/1563 [00:21<00:46, 22.76it/s]

batch 500 loss: 0.27485300302505494


Train, Epoch 9 / 20:  33%|███▎      | 513/1563 [00:21<00:43, 23.97it/s]

batch 510 loss: 0.23752287700772284


Train, Epoch 9 / 20:  34%|███▎      | 525/1563 [00:21<00:42, 24.50it/s]

batch 520 loss: 0.18559330925345421


Train, Epoch 9 / 20:  34%|███▍      | 534/1563 [00:22<00:41, 24.51it/s]

batch 530 loss: 0.2903110384941101


Train, Epoch 9 / 20:  35%|███▍      | 543/1563 [00:22<00:41, 24.79it/s]

batch 540 loss: 0.25687482878565787


Train, Epoch 9 / 20:  36%|███▌      | 555/1563 [00:23<00:40, 24.91it/s]

batch 550 loss: 0.19381727054715156


Train, Epoch 9 / 20:  36%|███▌      | 564/1563 [00:23<00:40, 24.80it/s]

batch 560 loss: 0.2180707611143589


Train, Epoch 9 / 20:  37%|███▋      | 573/1563 [00:23<00:39, 24.76it/s]

batch 570 loss: 0.24843878522515297


Train, Epoch 9 / 20:  37%|███▋      | 585/1563 [00:24<00:39, 24.69it/s]

batch 580 loss: 0.29253111183643343


Train, Epoch 9 / 20:  38%|███▊      | 594/1563 [00:24<00:39, 24.50it/s]

batch 590 loss: 0.2632426455616951


Train, Epoch 9 / 20:  39%|███▊      | 603/1563 [00:25<00:38, 24.76it/s]

batch 600 loss: 0.27633014395833017


Train, Epoch 9 / 20:  39%|███▉      | 615/1563 [00:25<00:38, 24.73it/s]

batch 610 loss: 0.3124226525425911


Train, Epoch 9 / 20:  40%|███▉      | 624/1563 [00:25<00:38, 24.63it/s]

batch 620 loss: 0.24817370921373366


Train, Epoch 9 / 20:  40%|████      | 633/1563 [00:26<00:38, 24.19it/s]

batch 630 loss: 0.31961658373475077


Train, Epoch 9 / 20:  41%|████      | 642/1563 [00:26<00:38, 23.97it/s]

batch 640 loss: 0.329194913059473


Train, Epoch 9 / 20:  42%|████▏     | 654/1563 [00:27<00:36, 24.69it/s]

batch 650 loss: 0.2241718165576458


Train, Epoch 9 / 20:  42%|████▏     | 663/1563 [00:27<00:36, 24.71it/s]

batch 660 loss: 0.22830233462154864


Train, Epoch 9 / 20:  43%|████▎     | 675/1563 [00:28<00:35, 24.69it/s]

batch 670 loss: 0.35908946245908735


Train, Epoch 9 / 20:  44%|████▍     | 684/1563 [00:28<00:35, 24.68it/s]

batch 680 loss: 0.327138564735651


Train, Epoch 9 / 20:  44%|████▍     | 693/1563 [00:28<00:35, 24.62it/s]

batch 690 loss: 0.2330816902220249


Train, Epoch 9 / 20:  45%|████▌     | 705/1563 [00:29<00:34, 24.76it/s]

batch 700 loss: 0.28083673119544983


Train, Epoch 9 / 20:  46%|████▌     | 714/1563 [00:29<00:34, 24.87it/s]

batch 710 loss: 0.29104426577687265


Train, Epoch 9 / 20:  46%|████▋     | 723/1563 [00:29<00:33, 24.76it/s]

batch 720 loss: 0.27308716252446175


Train, Epoch 9 / 20:  47%|████▋     | 735/1563 [00:30<00:33, 24.58it/s]

batch 730 loss: 0.2502989761531353


Train, Epoch 9 / 20:  48%|████▊     | 744/1563 [00:30<00:34, 23.84it/s]

batch 740 loss: 0.3043126255273819


Train, Epoch 9 / 20:  48%|████▊     | 753/1563 [00:31<00:36, 22.31it/s]

batch 750 loss: 0.21434525027871132


Train, Epoch 9 / 20:  49%|████▉     | 762/1563 [00:31<00:37, 21.35it/s]

batch 760 loss: 0.20124224424362183


Train, Epoch 9 / 20:  50%|████▉     | 774/1563 [00:32<00:37, 20.89it/s]

batch 770 loss: 0.28152220249176024


Train, Epoch 9 / 20:  50%|█████     | 783/1563 [00:32<00:36, 21.13it/s]

batch 780 loss: 0.29621867537498475


Train, Epoch 9 / 20:  51%|█████     | 792/1563 [00:33<00:36, 21.32it/s]

batch 790 loss: 0.26637687496840956


Train, Epoch 9 / 20:  51%|█████▏    | 804/1563 [00:33<00:32, 23.10it/s]

batch 800 loss: 0.3502251014113426


Train, Epoch 9 / 20:  52%|█████▏    | 813/1563 [00:34<00:31, 24.00it/s]

batch 810 loss: 0.21792075634002686


Train, Epoch 9 / 20:  53%|█████▎    | 825/1563 [00:34<00:29, 24.65it/s]

batch 820 loss: 0.3029404848814011


Train, Epoch 9 / 20:  53%|█████▎    | 834/1563 [00:34<00:29, 24.75it/s]

batch 830 loss: 0.29832848012447355


Train, Epoch 9 / 20:  54%|█████▍    | 843/1563 [00:35<00:29, 24.09it/s]

batch 840 loss: 0.26952745616436


Train, Epoch 9 / 20:  55%|█████▍    | 855/1563 [00:35<00:28, 24.53it/s]

batch 850 loss: 0.3175401329994202


Train, Epoch 9 / 20:  55%|█████▌    | 864/1563 [00:36<00:28, 24.47it/s]

batch 860 loss: 0.2727502502501011


Train, Epoch 9 / 20:  56%|█████▌    | 873/1563 [00:36<00:28, 24.56it/s]

batch 870 loss: 0.27811290323734283


Train, Epoch 9 / 20:  57%|█████▋    | 885/1563 [00:36<00:27, 24.43it/s]

batch 880 loss: 0.2740523055195808


Train, Epoch 9 / 20:  57%|█████▋    | 894/1563 [00:37<00:27, 24.75it/s]

batch 890 loss: 0.2514911025762558


Train, Epoch 9 / 20:  58%|█████▊    | 903/1563 [00:37<00:26, 24.79it/s]

batch 900 loss: 0.26183436512947084


Train, Epoch 9 / 20:  59%|█████▊    | 915/1563 [00:38<00:26, 24.74it/s]

batch 910 loss: 0.2877300813794136


Train, Epoch 9 / 20:  59%|█████▉    | 924/1563 [00:38<00:26, 24.51it/s]

batch 920 loss: 0.23944796100258828


Train, Epoch 9 / 20:  60%|█████▉    | 933/1563 [00:38<00:25, 24.71it/s]

batch 930 loss: 0.3711061835289001


Train, Epoch 9 / 20:  60%|██████    | 945/1563 [00:39<00:24, 24.77it/s]

batch 940 loss: 0.29440496638417246


Train, Epoch 9 / 20:  61%|██████    | 954/1563 [00:39<00:24, 24.86it/s]

batch 950 loss: 0.3301020137965679


Train, Epoch 9 / 20:  62%|██████▏   | 963/1563 [00:40<00:24, 24.72it/s]

batch 960 loss: 0.27024596855044364


Train, Epoch 9 / 20:  62%|██████▏   | 975/1563 [00:40<00:23, 24.87it/s]

batch 970 loss: 0.29636642560362814


Train, Epoch 9 / 20:  63%|██████▎   | 984/1563 [00:40<00:23, 24.29it/s]

batch 980 loss: 0.3770240515470505


Train, Epoch 9 / 20:  64%|██████▎   | 993/1563 [00:41<00:23, 23.90it/s]

batch 990 loss: 0.34361082017421724


Train, Epoch 9 / 20:  64%|██████▍   | 1005/1563 [00:41<00:22, 24.74it/s]

batch 1000 loss: 0.25572954714298246


Train, Epoch 9 / 20:  65%|██████▍   | 1014/1563 [00:42<00:22, 24.69it/s]

batch 1010 loss: 0.34070208221673964


Train, Epoch 9 / 20:  65%|██████▌   | 1023/1563 [00:42<00:21, 24.73it/s]

batch 1020 loss: 0.30264865458011625


Train, Epoch 9 / 20:  66%|██████▌   | 1035/1563 [00:43<00:21, 24.66it/s]

batch 1030 loss: 0.3527009509503841


Train, Epoch 9 / 20:  67%|██████▋   | 1044/1563 [00:43<00:21, 23.93it/s]

batch 1040 loss: 0.24462642669677734


Train, Epoch 9 / 20:  67%|██████▋   | 1053/1563 [00:43<00:22, 23.15it/s]

batch 1050 loss: 0.2346271239221096


Train, Epoch 9 / 20:  68%|██████▊   | 1062/1563 [00:44<00:21, 23.16it/s]

batch 1060 loss: 0.25720544308424


Train, Epoch 9 / 20:  69%|██████▊   | 1074/1563 [00:44<00:21, 23.12it/s]

batch 1070 loss: 0.2148047626018524


Train, Epoch 9 / 20:  69%|██████▉   | 1083/1563 [00:45<00:21, 22.33it/s]

batch 1080 loss: 0.3024756357073784


Train, Epoch 9 / 20:  70%|██████▉   | 1092/1563 [00:45<00:21, 22.05it/s]

batch 1090 loss: 0.2345667786896229


Train, Epoch 9 / 20:  71%|███████   | 1104/1563 [00:46<00:19, 23.72it/s]

batch 1100 loss: 0.30592572391033174


Train, Epoch 9 / 20:  71%|███████   | 1113/1563 [00:46<00:18, 24.16it/s]

batch 1110 loss: 0.3147065550088882


Train, Epoch 9 / 20:  72%|███████▏  | 1125/1563 [00:46<00:17, 24.53it/s]

batch 1120 loss: 0.27284877374768257


Train, Epoch 9 / 20:  73%|███████▎  | 1134/1563 [00:47<00:17, 23.89it/s]

batch 1130 loss: 0.2586067162454128


Train, Epoch 9 / 20:  73%|███████▎  | 1143/1563 [00:47<00:17, 24.35it/s]

batch 1140 loss: 0.2515276290476322


Train, Epoch 9 / 20:  74%|███████▍  | 1155/1563 [00:48<00:16, 24.55it/s]

batch 1150 loss: 0.3200408138334751


Train, Epoch 9 / 20:  74%|███████▍  | 1164/1563 [00:48<00:16, 24.63it/s]

batch 1160 loss: 0.26698225662112235


Train, Epoch 9 / 20:  75%|███████▌  | 1173/1563 [00:48<00:15, 24.60it/s]

batch 1170 loss: 0.2873413681983948


Train, Epoch 9 / 20:  76%|███████▌  | 1185/1563 [00:49<00:15, 24.68it/s]

batch 1180 loss: 0.2593444660305977


Train, Epoch 9 / 20:  76%|███████▋  | 1194/1563 [00:49<00:15, 24.43it/s]

batch 1190 loss: 0.3661955431103706


Train, Epoch 9 / 20:  77%|███████▋  | 1203/1563 [00:50<00:14, 24.73it/s]

batch 1200 loss: 0.3149050667881966


Train, Epoch 9 / 20:  78%|███████▊  | 1215/1563 [00:50<00:13, 24.89it/s]

batch 1210 loss: 0.3048096425831318


Train, Epoch 9 / 20:  78%|███████▊  | 1224/1563 [00:50<00:13, 24.86it/s]

batch 1220 loss: 0.300959599763155


Train, Epoch 9 / 20:  79%|███████▉  | 1233/1563 [00:51<00:13, 24.72it/s]

batch 1230 loss: 0.23256939686834813


Train, Epoch 9 / 20:  80%|███████▉  | 1245/1563 [00:51<00:12, 24.85it/s]

batch 1240 loss: 0.2961549453437328


Train, Epoch 9 / 20:  80%|████████  | 1254/1563 [00:52<00:12, 24.77it/s]

batch 1250 loss: 0.17008532881736754


Train, Epoch 9 / 20:  81%|████████  | 1263/1563 [00:52<00:12, 24.73it/s]

batch 1260 loss: 0.2598601870238781


Train, Epoch 9 / 20:  81%|████████▏ | 1272/1563 [00:52<00:11, 24.71it/s]

batch 1270 loss: 0.3394962713122368


Train, Epoch 9 / 20:  82%|████████▏ | 1284/1563 [00:53<00:11, 24.62it/s]

batch 1280 loss: 0.3065133675932884


Train, Epoch 9 / 20:  83%|████████▎ | 1293/1563 [00:53<00:10, 24.79it/s]

batch 1290 loss: 0.3348499648272991


Train, Epoch 9 / 20:  83%|████████▎ | 1305/1563 [00:54<00:10, 24.83it/s]

batch 1300 loss: 0.27922049462795256


Train, Epoch 9 / 20:  84%|████████▍ | 1314/1563 [00:54<00:10, 24.67it/s]

batch 1310 loss: 0.2692758485674858


Train, Epoch 9 / 20:  85%|████████▍ | 1323/1563 [00:55<00:09, 24.60it/s]

batch 1320 loss: 0.3159219339489937


Train, Epoch 9 / 20:  85%|████████▌ | 1332/1563 [00:55<00:09, 24.47it/s]

batch 1330 loss: 0.27293719798326493


Train, Epoch 9 / 20:  86%|████████▌ | 1344/1563 [00:55<00:09, 24.14it/s]

batch 1340 loss: 0.2591414138674736


Train, Epoch 9 / 20:  87%|████████▋ | 1353/1563 [00:56<00:09, 23.01it/s]

batch 1350 loss: 0.2529101461172104


Train, Epoch 9 / 20:  87%|████████▋ | 1362/1563 [00:56<00:09, 22.17it/s]

batch 1360 loss: 0.283264571428299


Train, Epoch 9 / 20:  88%|████████▊ | 1374/1563 [00:57<00:08, 21.71it/s]

batch 1370 loss: 0.2502674862742424


Train, Epoch 9 / 20:  88%|████████▊ | 1383/1563 [00:57<00:08, 21.56it/s]

batch 1380 loss: 0.3170065596699715


Train, Epoch 9 / 20:  89%|████████▉ | 1392/1563 [00:58<00:08, 20.69it/s]

batch 1390 loss: 0.2901325456798077


Train, Epoch 9 / 20:  90%|████████▉ | 1404/1563 [00:58<00:06, 23.43it/s]

batch 1400 loss: 0.28303245827555656


Train, Epoch 9 / 20:  90%|█████████ | 1413/1563 [00:58<00:06, 24.19it/s]

batch 1410 loss: 0.2743615090847015


Train, Epoch 9 / 20:  91%|█████████ | 1425/1563 [00:59<00:05, 24.47it/s]

batch 1420 loss: 0.29644088000059127


Train, Epoch 9 / 20:  92%|█████████▏| 1434/1563 [00:59<00:05, 24.35it/s]

batch 1430 loss: 0.24957113936543465


Train, Epoch 9 / 20:  92%|█████████▏| 1443/1563 [01:00<00:04, 24.49it/s]

batch 1440 loss: 0.19886974170804023


Train, Epoch 9 / 20:  93%|█████████▎| 1455/1563 [01:00<00:04, 24.82it/s]

batch 1450 loss: 0.34747690707445145


Train, Epoch 9 / 20:  94%|█████████▎| 1464/1563 [01:01<00:04, 24.60it/s]

batch 1460 loss: 0.28061140030622483


Train, Epoch 9 / 20:  94%|█████████▍| 1473/1563 [01:01<00:03, 24.67it/s]

batch 1470 loss: 0.18467142134904863


Train, Epoch 9 / 20:  95%|█████████▌| 1485/1563 [01:01<00:03, 24.68it/s]

batch 1480 loss: 0.34220951423048973


Train, Epoch 9 / 20:  96%|█████████▌| 1494/1563 [01:02<00:02, 24.93it/s]

batch 1490 loss: 0.33941956907510756


Train, Epoch 9 / 20:  96%|█████████▌| 1503/1563 [01:02<00:02, 24.74it/s]

batch 1500 loss: 0.26797334402799605


Train, Epoch 9 / 20:  97%|█████████▋| 1515/1563 [01:03<00:01, 24.93it/s]

batch 1510 loss: 0.3176534056663513


Train, Epoch 9 / 20:  98%|█████████▊| 1524/1563 [01:03<00:01, 24.77it/s]

batch 1520 loss: 0.2861692041158676


Train, Epoch 9 / 20:  98%|█████████▊| 1533/1563 [01:03<00:01, 24.47it/s]

batch 1530 loss: 0.2208839774131775


Train, Epoch 9 / 20:  99%|█████████▉| 1545/1563 [01:04<00:00, 24.58it/s]

batch 1540 loss: 0.28144084215164183


Train, Epoch 9 / 20:  99%|█████████▉| 1554/1563 [01:04<00:00, 24.63it/s]

batch 1550 loss: 0.3204451620578766


Train, Epoch 9 / 20: 100%|██████████| 1563/1563 [01:05<00:00, 24.02it/s]


batch 1560 loss: 0.3154409423470497


Test, Epoch 9 / 20: 100%|██████████| 1563/1563 [00:30<00:00, 51.07it/s]


Epoch 9, loss: 0.46509002470195293, accuracy: 0.80456


Train, Epoch 10 / 20:   1%|          | 15/1563 [00:00<01:02, 24.62it/s]

batch 10 loss: 0.2503633894026279


Train, Epoch 10 / 20:   2%|▏         | 24/1563 [00:00<01:02, 24.69it/s]

batch 20 loss: 0.2370956439524889


Train, Epoch 10 / 20:   2%|▏         | 33/1563 [00:01<01:02, 24.33it/s]

batch 30 loss: 0.3048309415578842


Train, Epoch 10 / 20:   3%|▎         | 45/1563 [00:01<01:02, 24.44it/s]

batch 40 loss: 0.2653053104877472


Train, Epoch 10 / 20:   3%|▎         | 54/1563 [00:02<01:01, 24.42it/s]

batch 50 loss: 0.22103567123413087


Train, Epoch 10 / 20:   4%|▍         | 63/1563 [00:02<01:03, 23.70it/s]

batch 60 loss: 0.21962944120168687


Train, Epoch 10 / 20:   5%|▍         | 72/1563 [00:02<01:01, 24.26it/s]

batch 70 loss: 0.18541738539934158


Train, Epoch 10 / 20:   5%|▌         | 84/1563 [00:03<01:01, 24.13it/s]

batch 80 loss: 0.3225243717432022


Train, Epoch 10 / 20:   6%|▌         | 93/1563 [00:03<01:00, 24.43it/s]

batch 90 loss: 0.2881683215498924


Train, Epoch 10 / 20:   7%|▋         | 102/1563 [00:04<00:59, 24.37it/s]

batch 100 loss: 0.22748560905456544


Train, Epoch 10 / 20:   7%|▋         | 114/1563 [00:04<00:59, 24.52it/s]

batch 110 loss: 0.250778466463089


Train, Epoch 10 / 20:   8%|▊         | 123/1563 [00:05<00:59, 24.34it/s]

batch 120 loss: 0.2481578841805458


Train, Epoch 10 / 20:   9%|▊         | 135/1563 [00:05<00:58, 24.48it/s]

batch 130 loss: 0.2183644652366638


Train, Epoch 10 / 20:   9%|▉         | 144/1563 [00:05<00:58, 24.39it/s]

batch 140 loss: 0.2428643561899662


Train, Epoch 10 / 20:  10%|▉         | 153/1563 [00:06<00:59, 23.65it/s]

batch 150 loss: 0.3414902612566948


Train, Epoch 10 / 20:  11%|█         | 165/1563 [00:06<00:57, 24.12it/s]

batch 160 loss: 0.24243163987994193


Train, Epoch 10 / 20:  11%|█         | 174/1563 [00:07<00:56, 24.42it/s]

batch 170 loss: 0.25528679564595225


Train, Epoch 10 / 20:  12%|█▏        | 183/1563 [00:07<00:57, 24.09it/s]

batch 180 loss: 0.22760459929704666


Train, Epoch 10 / 20:  12%|█▏        | 195/1563 [00:08<00:55, 24.46it/s]

batch 190 loss: 0.280350935459137


Train, Epoch 10 / 20:  13%|█▎        | 204/1563 [00:08<00:55, 24.49it/s]

batch 200 loss: 0.2363944508135319


Train, Epoch 10 / 20:  14%|█▎        | 213/1563 [00:08<00:54, 24.57it/s]

batch 210 loss: 0.2728255517780781


Train, Epoch 10 / 20:  14%|█▍        | 222/1563 [00:09<00:54, 24.56it/s]

batch 220 loss: 0.2364900939166546


Train, Epoch 10 / 20:  15%|█▍        | 234/1563 [00:09<00:54, 24.22it/s]

batch 230 loss: 0.3099446684122086


Train, Epoch 10 / 20:  16%|█▌        | 243/1563 [00:10<00:58, 22.75it/s]

batch 240 loss: 0.2510800860822201


Train, Epoch 10 / 20:  16%|█▌        | 252/1563 [00:10<01:00, 21.58it/s]

batch 250 loss: 0.2721540242433548


Train, Epoch 10 / 20:  17%|█▋        | 264/1563 [00:11<01:00, 21.37it/s]

batch 260 loss: 0.344561680406332


Train, Epoch 10 / 20:  17%|█▋        | 273/1563 [00:11<01:02, 20.67it/s]

batch 270 loss: 0.28375293537974355


Train, Epoch 10 / 20:  18%|█▊        | 282/1563 [00:11<01:00, 21.27it/s]

batch 280 loss: 0.32368946373462676


Train, Epoch 10 / 20:  19%|█▉        | 294/1563 [00:12<00:56, 22.51it/s]

batch 290 loss: 0.27935843616724015


Train, Epoch 10 / 20:  19%|█▉        | 303/1563 [00:12<00:53, 23.74it/s]

batch 300 loss: 0.29805096089839933


Train, Epoch 10 / 20:  20%|█▉        | 312/1563 [00:13<00:52, 23.96it/s]

batch 310 loss: 0.24915266856551171


Train, Epoch 10 / 20:  21%|██        | 324/1563 [00:13<00:50, 24.30it/s]

batch 320 loss: 0.2532946072518826


Train, Epoch 10 / 20:  21%|██▏       | 333/1563 [00:14<00:50, 24.38it/s]

batch 330 loss: 0.33879345953464507


Train, Epoch 10 / 20:  22%|██▏       | 345/1563 [00:14<00:49, 24.63it/s]

batch 340 loss: 0.22797172293066978


Train, Epoch 10 / 20:  23%|██▎       | 354/1563 [00:14<00:49, 24.45it/s]

batch 350 loss: 0.29747961163520814


Train, Epoch 10 / 20:  23%|██▎       | 363/1563 [00:15<00:48, 24.80it/s]

batch 360 loss: 0.2730304218828678


Train, Epoch 10 / 20:  24%|██▍       | 372/1563 [00:15<00:49, 24.29it/s]

batch 370 loss: 0.26039229445159434


Train, Epoch 10 / 20:  25%|██▍       | 384/1563 [00:16<00:48, 24.49it/s]

batch 380 loss: 0.21829143017530442


Train, Epoch 10 / 20:  25%|██▌       | 393/1563 [00:16<00:48, 24.29it/s]

batch 390 loss: 0.31704411953687667


Train, Epoch 10 / 20:  26%|██▌       | 405/1563 [00:17<00:47, 24.31it/s]

batch 400 loss: 0.24686308205127716


Train, Epoch 10 / 20:  26%|██▋       | 414/1563 [00:17<00:47, 24.32it/s]

batch 410 loss: 0.29614018462598324


Train, Epoch 10 / 20:  27%|██▋       | 423/1563 [00:17<00:47, 24.25it/s]

batch 420 loss: 0.3010003849864006


Train, Epoch 10 / 20:  28%|██▊       | 435/1563 [00:18<00:46, 24.35it/s]

batch 430 loss: 0.24588389620184897


Train, Epoch 10 / 20:  28%|██▊       | 444/1563 [00:18<00:46, 24.20it/s]

batch 440 loss: 0.2552258506417274


Train, Epoch 10 / 20:  29%|██▉       | 453/1563 [00:19<00:45, 24.20it/s]

batch 450 loss: 0.3240756824612617


Train, Epoch 10 / 20:  30%|██▉       | 462/1563 [00:19<00:45, 24.29it/s]

batch 460 loss: 0.22036198005080224


Train, Epoch 10 / 20:  30%|███       | 474/1563 [00:19<00:45, 23.98it/s]

batch 470 loss: 0.29165196046233177


Train, Epoch 10 / 20:  31%|███       | 483/1563 [00:20<00:44, 24.33it/s]

batch 480 loss: 0.31495187282562254


Train, Epoch 10 / 20:  32%|███▏      | 495/1563 [00:20<00:43, 24.58it/s]

batch 490 loss: 0.23848020732402803


Train, Epoch 10 / 20:  32%|███▏      | 504/1563 [00:21<00:42, 24.70it/s]

batch 500 loss: 0.2524456650018692


Train, Epoch 10 / 20:  33%|███▎      | 513/1563 [00:21<00:42, 24.60it/s]

batch 510 loss: 0.2063457690179348


Train, Epoch 10 / 20:  33%|███▎      | 522/1563 [00:21<00:43, 24.04it/s]

batch 520 loss: 0.31081063449382784


Train, Epoch 10 / 20:  34%|███▍      | 534/1563 [00:22<00:44, 23.32it/s]

batch 530 loss: 0.23935176730155944


Train, Epoch 10 / 20:  35%|███▍      | 543/1563 [00:22<00:47, 21.65it/s]

batch 540 loss: 0.24617237597703934


Train, Epoch 10 / 20:  35%|███▌      | 552/1563 [00:23<00:46, 21.53it/s]

batch 550 loss: 0.26998069137334824


Train, Epoch 10 / 20:  36%|███▌      | 564/1563 [00:23<00:48, 20.81it/s]

batch 560 loss: 0.3025904208421707


Train, Epoch 10 / 20:  37%|███▋      | 573/1563 [00:24<00:47, 20.88it/s]

batch 570 loss: 0.3339842244982719


Train, Epoch 10 / 20:  37%|███▋      | 582/1563 [00:24<00:47, 20.67it/s]

batch 580 loss: 0.263468798995018


Train, Epoch 10 / 20:  38%|███▊      | 594/1563 [00:25<00:41, 23.19it/s]

batch 590 loss: 0.28303655087947843


Train, Epoch 10 / 20:  39%|███▊      | 603/1563 [00:25<00:40, 23.96it/s]

batch 600 loss: 0.27807202488183974


Train, Epoch 10 / 20:  39%|███▉      | 615/1563 [00:26<00:39, 24.04it/s]

batch 610 loss: 0.2741406038403511


Train, Epoch 10 / 20:  40%|███▉      | 624/1563 [00:26<00:38, 24.25it/s]

batch 620 loss: 0.24260495454072953


Train, Epoch 10 / 20:  40%|████      | 633/1563 [00:26<00:38, 24.23it/s]

batch 630 loss: 0.2936805322766304


Train, Epoch 10 / 20:  41%|████▏     | 645/1563 [00:27<00:37, 24.29it/s]

batch 640 loss: 0.2865866832435131


Train, Epoch 10 / 20:  42%|████▏     | 654/1563 [00:27<00:37, 24.50it/s]

batch 650 loss: 0.22772187180817127


Train, Epoch 10 / 20:  42%|████▏     | 663/1563 [00:28<00:37, 24.29it/s]

batch 660 loss: 0.326946297287941


Train, Epoch 10 / 20:  43%|████▎     | 675/1563 [00:28<00:36, 24.42it/s]

batch 670 loss: 0.2941096410155296


Train, Epoch 10 / 20:  44%|████▍     | 684/1563 [00:28<00:36, 24.33it/s]

batch 680 loss: 0.2554058477282524


Train, Epoch 10 / 20:  44%|████▍     | 693/1563 [00:29<00:35, 24.25it/s]

batch 690 loss: 0.2359485521912575


Train, Epoch 10 / 20:  45%|████▌     | 705/1563 [00:29<00:35, 24.46it/s]

batch 700 loss: 0.2118362955749035


Train, Epoch 10 / 20:  46%|████▌     | 714/1563 [00:30<00:35, 24.23it/s]

batch 710 loss: 0.28012869954109193


Train, Epoch 10 / 20:  46%|████▋     | 723/1563 [00:30<00:35, 23.62it/s]

batch 720 loss: 0.2841538205742836


Train, Epoch 10 / 20:  47%|████▋     | 732/1563 [00:30<00:34, 23.94it/s]

batch 730 loss: 0.29039173386991024


Train, Epoch 10 / 20:  48%|████▊     | 744/1563 [00:31<00:33, 24.20it/s]

batch 740 loss: 0.3460027568042278


Train, Epoch 10 / 20:  48%|████▊     | 753/1563 [00:31<00:33, 24.03it/s]

batch 750 loss: 0.2610479101538658


Train, Epoch 10 / 20:  49%|████▉     | 762/1563 [00:32<00:33, 23.87it/s]

batch 760 loss: 0.32400163263082504


Train, Epoch 10 / 20:  50%|████▉     | 774/1563 [00:32<00:32, 24.27it/s]

batch 770 loss: 0.3117115445435047


Train, Epoch 10 / 20:  50%|█████     | 783/1563 [00:32<00:31, 24.60it/s]

batch 780 loss: 0.2466026395559311


Train, Epoch 10 / 20:  51%|█████     | 795/1563 [00:33<00:31, 24.36it/s]

batch 790 loss: 0.2867956396192312


Train, Epoch 10 / 20:  51%|█████▏    | 804/1563 [00:33<00:31, 24.45it/s]

batch 800 loss: 0.37599184811115266


Train, Epoch 10 / 20:  52%|█████▏    | 813/1563 [00:34<00:31, 24.17it/s]

batch 810 loss: 0.38280766308307645


Train, Epoch 10 / 20:  53%|█████▎    | 822/1563 [00:34<00:30, 24.13it/s]

batch 820 loss: 0.29144482016563417


Train, Epoch 10 / 20:  53%|█████▎    | 834/1563 [00:35<00:33, 21.62it/s]

batch 830 loss: 0.35674161911010743


Train, Epoch 10 / 20:  54%|█████▍    | 843/1563 [00:35<00:32, 21.99it/s]

batch 840 loss: 0.3302350714802742


Train, Epoch 10 / 20:  55%|█████▍    | 852/1563 [00:35<00:31, 22.52it/s]

batch 850 loss: 0.23667147308588027


Train, Epoch 10 / 20:  55%|█████▌    | 864/1563 [00:36<00:33, 20.72it/s]

batch 860 loss: 0.2642344169318676


Train, Epoch 10 / 20:  56%|█████▌    | 873/1563 [00:37<00:33, 20.60it/s]

batch 870 loss: 0.21354278847575187


Train, Epoch 10 / 20:  56%|█████▋    | 882/1563 [00:37<00:30, 22.38it/s]

batch 880 loss: 0.3306238383054733


Train, Epoch 10 / 20:  57%|█████▋    | 894/1563 [00:37<00:28, 23.87it/s]

batch 890 loss: 0.338058752566576


Train, Epoch 10 / 20:  58%|█████▊    | 903/1563 [00:38<00:27, 24.08it/s]

batch 900 loss: 0.285371932387352


Train, Epoch 10 / 20:  58%|█████▊    | 912/1563 [00:38<00:26, 24.14it/s]

batch 910 loss: 0.27187995314598085


Train, Epoch 10 / 20:  59%|█████▉    | 924/1563 [00:39<00:26, 24.39it/s]

batch 920 loss: 0.27317156493663786


Train, Epoch 10 / 20:  60%|█████▉    | 933/1563 [00:39<00:26, 24.01it/s]

batch 930 loss: 0.25779973790049554


Train, Epoch 10 / 20:  60%|██████    | 945/1563 [00:40<00:25, 24.49it/s]

batch 940 loss: 0.2576716050505638


Train, Epoch 10 / 20:  61%|██████    | 954/1563 [00:40<00:25, 24.16it/s]

batch 950 loss: 0.29538106471300124


Train, Epoch 10 / 20:  62%|██████▏   | 963/1563 [00:40<00:24, 24.36it/s]

batch 960 loss: 0.32032442539930345


Train, Epoch 10 / 20:  62%|██████▏   | 975/1563 [00:41<00:24, 24.40it/s]

batch 970 loss: 0.22325844913721085


Train, Epoch 10 / 20:  63%|██████▎   | 984/1563 [00:41<00:24, 24.06it/s]

batch 980 loss: 0.19633842408657073


Train, Epoch 10 / 20:  64%|██████▎   | 993/1563 [00:41<00:23, 24.25it/s]

batch 990 loss: 0.32306470572948454


Train, Epoch 10 / 20:  64%|██████▍   | 1005/1563 [00:42<00:23, 24.08it/s]

batch 1000 loss: 0.2684148348867893


Train, Epoch 10 / 20:  65%|██████▍   | 1014/1563 [00:42<00:22, 24.22it/s]

batch 1010 loss: 0.2796421110630035


Train, Epoch 10 / 20:  65%|██████▌   | 1023/1563 [00:43<00:22, 24.34it/s]

batch 1020 loss: 0.2454416498541832


Train, Epoch 10 / 20:  66%|██████▌   | 1035/1563 [00:43<00:21, 24.44it/s]

batch 1030 loss: 0.23010548949241638


Train, Epoch 10 / 20:  67%|██████▋   | 1044/1563 [00:44<00:21, 24.26it/s]

batch 1040 loss: 0.31239710822701455


Train, Epoch 10 / 20:  67%|██████▋   | 1053/1563 [00:44<00:21, 24.14it/s]

batch 1050 loss: 0.1943062473088503


Train, Epoch 10 / 20:  68%|██████▊   | 1065/1563 [00:44<00:20, 24.34it/s]

batch 1060 loss: 0.26858745217323304


Train, Epoch 10 / 20:  69%|██████▊   | 1074/1563 [00:45<00:20, 24.13it/s]

batch 1070 loss: 0.25643508434295653


Train, Epoch 10 / 20:  69%|██████▉   | 1083/1563 [00:45<00:19, 24.11it/s]

batch 1080 loss: 0.21342942789196967


Train, Epoch 10 / 20:  70%|███████   | 1095/1563 [00:46<00:19, 24.55it/s]

batch 1090 loss: 0.25091516189277174


Train, Epoch 10 / 20:  71%|███████   | 1104/1563 [00:46<00:18, 24.30it/s]

batch 1100 loss: 0.23300397619605065


Train, Epoch 10 / 20:  71%|███████   | 1113/1563 [00:46<00:18, 24.11it/s]

batch 1110 loss: 0.3093419797718525


Train, Epoch 10 / 20:  72%|███████▏  | 1122/1563 [00:47<00:19, 22.63it/s]

batch 1120 loss: 0.33449349403381345


Train, Epoch 10 / 20:  73%|███████▎  | 1134/1563 [00:47<00:19, 22.25it/s]

batch 1130 loss: 0.31194276213645933


Train, Epoch 10 / 20:  73%|███████▎  | 1143/1563 [00:48<00:20, 20.97it/s]

batch 1140 loss: 0.27435884773731234


Train, Epoch 10 / 20:  74%|███████▎  | 1152/1563 [00:48<00:19, 21.01it/s]

batch 1150 loss: 0.28313675969839097


Train, Epoch 10 / 20:  74%|███████▍  | 1164/1563 [00:49<00:18, 21.05it/s]

batch 1160 loss: 0.3017993099987507


Train, Epoch 10 / 20:  75%|███████▌  | 1173/1563 [00:49<00:17, 22.14it/s]

batch 1170 loss: 0.30833143927156925


Train, Epoch 10 / 20:  76%|███████▌  | 1182/1563 [00:50<00:16, 23.42it/s]

batch 1180 loss: 0.18749852776527404


Train, Epoch 10 / 20:  76%|███████▋  | 1194/1563 [00:50<00:15, 23.87it/s]

batch 1190 loss: 0.24924026168882846


Train, Epoch 10 / 20:  77%|███████▋  | 1203/1563 [00:51<00:14, 24.09it/s]

batch 1200 loss: 0.23447803556919097


Train, Epoch 10 / 20:  78%|███████▊  | 1212/1563 [00:51<00:14, 24.22it/s]

batch 1210 loss: 0.31787410080432893


Train, Epoch 10 / 20:  78%|███████▊  | 1224/1563 [00:51<00:13, 24.24it/s]

batch 1220 loss: 0.290177609026432


Train, Epoch 10 / 20:  79%|███████▉  | 1233/1563 [00:52<00:13, 24.31it/s]

batch 1230 loss: 0.2969289518892765


Train, Epoch 10 / 20:  79%|███████▉  | 1242/1563 [00:52<00:13, 24.33it/s]

batch 1240 loss: 0.33560808151960375


Train, Epoch 10 / 20:  80%|████████  | 1254/1563 [00:53<00:12, 24.26it/s]

batch 1250 loss: 0.34052281826734543


Train, Epoch 10 / 20:  81%|████████  | 1263/1563 [00:53<00:12, 24.29it/s]

batch 1260 loss: 0.26001856848597527


Train, Epoch 10 / 20:  81%|████████▏ | 1272/1563 [00:53<00:12, 24.17it/s]

batch 1270 loss: 0.33545264378190043


Train, Epoch 10 / 20:  82%|████████▏ | 1284/1563 [00:54<00:11, 24.46it/s]

batch 1280 loss: 0.20649578124284745


Train, Epoch 10 / 20:  83%|████████▎ | 1293/1563 [00:54<00:11, 23.81it/s]

batch 1290 loss: 0.3309701502323151


Train, Epoch 10 / 20:  83%|████████▎ | 1305/1563 [00:55<00:10, 24.43it/s]

batch 1300 loss: 0.34312073364853857


Train, Epoch 10 / 20:  84%|████████▍ | 1314/1563 [00:55<00:10, 24.55it/s]

batch 1310 loss: 0.27002957463264465


Train, Epoch 10 / 20:  85%|████████▍ | 1323/1563 [00:55<00:09, 24.15it/s]

batch 1320 loss: 0.3063272014260292


Train, Epoch 10 / 20:  85%|████████▌ | 1335/1563 [00:56<00:09, 24.48it/s]

batch 1330 loss: 0.2502034783363342


Train, Epoch 10 / 20:  86%|████████▌ | 1344/1563 [00:56<00:09, 24.01it/s]

batch 1340 loss: 0.35378076508641243


Train, Epoch 10 / 20:  87%|████████▋ | 1353/1563 [00:57<00:08, 24.36it/s]

batch 1350 loss: 0.24961869716644286


Train, Epoch 10 / 20:  87%|████████▋ | 1365/1563 [00:57<00:08, 24.50it/s]

batch 1360 loss: 0.34956386387348176


Train, Epoch 10 / 20:  88%|████████▊ | 1374/1563 [00:58<00:07, 24.12it/s]

batch 1370 loss: 0.2174866758286953


Train, Epoch 10 / 20:  88%|████████▊ | 1383/1563 [00:58<00:07, 24.41it/s]

batch 1380 loss: 0.2810308150947094


Train, Epoch 10 / 20:  89%|████████▉ | 1392/1563 [00:58<00:07, 24.24it/s]

batch 1390 loss: 0.2293469063937664


Train, Epoch 10 / 20:  90%|████████▉ | 1404/1563 [00:59<00:06, 24.38it/s]

batch 1400 loss: 0.28021337613463404


Train, Epoch 10 / 20:  90%|█████████ | 1413/1563 [00:59<00:06, 22.38it/s]

batch 1410 loss: 0.23500030413269996


Train, Epoch 10 / 20:  91%|█████████ | 1422/1563 [01:00<00:06, 21.61it/s]

batch 1420 loss: 0.3434141680598259


Train, Epoch 10 / 20:  92%|█████████▏| 1434/1563 [01:00<00:05, 22.21it/s]

batch 1430 loss: 0.2748253583908081


Train, Epoch 10 / 20:  92%|█████████▏| 1443/1563 [01:01<00:05, 22.02it/s]

batch 1440 loss: 0.2133644036948681


Train, Epoch 10 / 20:  93%|█████████▎| 1452/1563 [01:01<00:05, 21.55it/s]

batch 1450 loss: 0.3511425867676735


Train, Epoch 10 / 20:  94%|█████████▎| 1464/1563 [01:02<00:04, 21.88it/s]

batch 1460 loss: 0.18209974020719527


Train, Epoch 10 / 20:  94%|█████████▍| 1473/1563 [01:02<00:03, 23.58it/s]

batch 1470 loss: 0.28571719080209734


Train, Epoch 10 / 20:  95%|█████████▌| 1485/1563 [01:02<00:03, 24.28it/s]

batch 1480 loss: 0.2824013441801071


Train, Epoch 10 / 20:  96%|█████████▌| 1494/1563 [01:03<00:02, 24.55it/s]

batch 1490 loss: 0.29757071286439896


Train, Epoch 10 / 20:  96%|█████████▌| 1503/1563 [01:03<00:02, 24.72it/s]

batch 1500 loss: 0.2832053564488888


Train, Epoch 10 / 20:  97%|█████████▋| 1515/1563 [01:04<00:01, 24.68it/s]

batch 1510 loss: 0.2524305799975991


Train, Epoch 10 / 20:  98%|█████████▊| 1524/1563 [01:04<00:01, 24.72it/s]

batch 1520 loss: 0.2325614832341671


Train, Epoch 10 / 20:  98%|█████████▊| 1533/1563 [01:04<00:01, 24.25it/s]

batch 1530 loss: 0.22743663117289542


Train, Epoch 10 / 20:  99%|█████████▉| 1545/1563 [01:05<00:00, 24.45it/s]

batch 1540 loss: 0.2990418940782547


Train, Epoch 10 / 20:  99%|█████████▉| 1554/1563 [01:05<00:00, 24.19it/s]

batch 1550 loss: 0.2741449363529682


Train, Epoch 10 / 20: 100%|██████████| 1563/1563 [01:06<00:00, 23.63it/s]


batch 1560 loss: 0.32641594409942626


Test, Epoch 10 / 20: 100%|██████████| 1563/1563 [00:30<00:00, 51.29it/s]


Epoch 10, loss: 0.4563367025381327, accuracy: 0.81172


Train, Epoch 11 / 20:   1%|          | 12/1563 [00:00<01:10, 21.98it/s]

batch 10 loss: 0.22855604365468024


Train, Epoch 11 / 20:   2%|▏         | 24/1563 [00:01<01:14, 20.64it/s]

batch 20 loss: 0.25296926498413086


Train, Epoch 11 / 20:   2%|▏         | 33/1563 [00:01<01:12, 21.06it/s]

batch 30 loss: 0.28725199699401854


Train, Epoch 11 / 20:   3%|▎         | 42/1563 [00:01<01:11, 21.38it/s]

batch 40 loss: 0.21739770770072936


Train, Epoch 11 / 20:   3%|▎         | 54/1563 [00:02<01:13, 20.66it/s]

batch 50 loss: 0.2638295903801918


Train, Epoch 11 / 20:   4%|▍         | 63/1563 [00:02<01:05, 22.78it/s]

batch 60 loss: 0.25082312524318695


Train, Epoch 11 / 20:   5%|▍         | 72/1563 [00:03<01:03, 23.66it/s]

batch 70 loss: 0.20533812344074248


Train, Epoch 11 / 20:   5%|▌         | 84/1563 [00:03<01:00, 24.37it/s]

batch 80 loss: 0.2707702860236168


Train, Epoch 11 / 20:   6%|▌         | 93/1563 [00:04<00:59, 24.51it/s]

batch 90 loss: 0.21572867333889006


Train, Epoch 11 / 20:   7%|▋         | 102/1563 [00:04<01:00, 24.24it/s]

batch 100 loss: 0.2127251535654068


Train, Epoch 11 / 20:   7%|▋         | 114/1563 [00:05<00:59, 24.26it/s]

batch 110 loss: 0.2910459332168102


Train, Epoch 11 / 20:   8%|▊         | 123/1563 [00:05<00:58, 24.51it/s]

batch 120 loss: 0.2710141122341156


Train, Epoch 11 / 20:   8%|▊         | 132/1563 [00:05<00:58, 24.26it/s]

batch 130 loss: 0.2552903354167938


Train, Epoch 11 / 20:   9%|▉         | 144/1563 [00:06<00:58, 24.36it/s]

batch 140 loss: 0.27533777356147765


Train, Epoch 11 / 20:  10%|▉         | 153/1563 [00:06<00:57, 24.53it/s]

batch 150 loss: 0.21813806891441345


Train, Epoch 11 / 20:  11%|█         | 165/1563 [00:07<00:56, 24.81it/s]

batch 160 loss: 0.2642866957932711


Train, Epoch 11 / 20:  11%|█         | 174/1563 [00:07<00:55, 24.85it/s]

batch 170 loss: 0.3160302549600601


Train, Epoch 11 / 20:  12%|█▏        | 183/1563 [00:07<00:56, 24.37it/s]

batch 180 loss: 0.31937087774276735


Train, Epoch 11 / 20:  12%|█▏        | 195/1563 [00:08<00:55, 24.52it/s]

batch 190 loss: 0.26463796496391295


Train, Epoch 11 / 20:  13%|█▎        | 204/1563 [00:08<00:56, 24.05it/s]

batch 200 loss: 0.22029609680175782


Train, Epoch 11 / 20:  14%|█▎        | 213/1563 [00:09<00:55, 24.30it/s]

batch 210 loss: 0.2513807713985443


Train, Epoch 11 / 20:  14%|█▍        | 225/1563 [00:09<00:55, 24.15it/s]

batch 220 loss: 0.3566250137984753


Train, Epoch 11 / 20:  15%|█▍        | 234/1563 [00:09<00:55, 24.15it/s]

batch 230 loss: 0.2567486599087715


Train, Epoch 11 / 20:  16%|█▌        | 243/1563 [00:10<00:53, 24.45it/s]

batch 240 loss: 0.3211390271782875


Train, Epoch 11 / 20:  16%|█▋        | 255/1563 [00:10<00:53, 24.25it/s]

batch 250 loss: 0.2730062995105982


Train, Epoch 11 / 20:  17%|█▋        | 264/1563 [00:11<00:53, 24.37it/s]

batch 260 loss: 0.2571390479803085


Train, Epoch 11 / 20:  17%|█▋        | 273/1563 [00:11<00:52, 24.52it/s]

batch 270 loss: 0.2357072576880455


Train, Epoch 11 / 20:  18%|█▊        | 285/1563 [00:12<00:51, 24.64it/s]

batch 280 loss: 0.2142909899353981


Train, Epoch 11 / 20:  19%|█▉        | 294/1563 [00:12<00:52, 24.25it/s]

batch 290 loss: 0.2671311847865582


Train, Epoch 11 / 20:  19%|█▉        | 303/1563 [00:12<00:54, 22.92it/s]

batch 300 loss: 0.27172470800578596


Train, Epoch 11 / 20:  20%|█▉        | 312/1563 [00:13<00:58, 21.48it/s]

batch 310 loss: 0.2677591428160667


Train, Epoch 11 / 20:  21%|██        | 324/1563 [00:13<00:56, 22.11it/s]

batch 320 loss: 0.2377843104302883


Train, Epoch 11 / 20:  21%|██▏       | 333/1563 [00:14<00:55, 22.12it/s]

batch 330 loss: 0.28493317514657973


Train, Epoch 11 / 20:  22%|██▏       | 342/1563 [00:14<00:56, 21.72it/s]

batch 340 loss: 0.26023269072175026


Train, Epoch 11 / 20:  23%|██▎       | 354/1563 [00:15<00:56, 21.48it/s]

batch 350 loss: 0.27951327711343765


Train, Epoch 11 / 20:  23%|██▎       | 363/1563 [00:15<00:51, 23.46it/s]

batch 360 loss: 0.3036467768251896


Train, Epoch 11 / 20:  24%|██▍       | 372/1563 [00:15<00:50, 23.62it/s]

batch 370 loss: 0.32549584805965426


Train, Epoch 11 / 20:  25%|██▍       | 384/1563 [00:16<00:48, 24.21it/s]

batch 380 loss: 0.23416438177227974


Train, Epoch 11 / 20:  25%|██▌       | 393/1563 [00:16<00:47, 24.52it/s]

batch 390 loss: 0.21338564679026603


Train, Epoch 11 / 20:  26%|██▌       | 405/1563 [00:17<00:47, 24.25it/s]

batch 400 loss: 0.16606895998120308


Train, Epoch 11 / 20:  26%|██▋       | 414/1563 [00:17<00:47, 24.41it/s]

batch 410 loss: 0.33362795412540436


Train, Epoch 11 / 20:  27%|██▋       | 423/1563 [00:18<00:46, 24.34it/s]

batch 420 loss: 0.2434712402522564


Train, Epoch 11 / 20:  28%|██▊       | 432/1563 [00:18<00:46, 24.26it/s]

batch 430 loss: 0.309719181060791


Train, Epoch 11 / 20:  28%|██▊       | 444/1563 [00:18<00:46, 24.14it/s]

batch 440 loss: 0.2804493546485901


Train, Epoch 11 / 20:  29%|██▉       | 453/1563 [00:19<00:45, 24.50it/s]

batch 450 loss: 0.34572083204984666


Train, Epoch 11 / 20:  30%|██▉       | 465/1563 [00:19<00:45, 24.40it/s]

batch 460 loss: 0.28925401344895363


Train, Epoch 11 / 20:  30%|███       | 474/1563 [00:20<00:44, 24.49it/s]

batch 470 loss: 0.28742215111851693


Train, Epoch 11 / 20:  31%|███       | 483/1563 [00:20<00:43, 24.66it/s]

batch 480 loss: 0.25042696967720984


Train, Epoch 11 / 20:  32%|███▏      | 495/1563 [00:21<00:43, 24.44it/s]

batch 490 loss: 0.18129161447286607


Train, Epoch 11 / 20:  32%|███▏      | 504/1563 [00:21<00:43, 24.47it/s]

batch 500 loss: 0.23810827136039733


Train, Epoch 11 / 20:  33%|███▎      | 513/1563 [00:21<00:43, 24.12it/s]

batch 510 loss: 0.28313823193311694


Train, Epoch 11 / 20:  34%|███▎      | 525/1563 [00:22<00:42, 24.59it/s]

batch 520 loss: 0.2091760165989399


Train, Epoch 11 / 20:  34%|███▍      | 534/1563 [00:22<00:41, 24.63it/s]

batch 530 loss: 0.2063553337007761


Train, Epoch 11 / 20:  35%|███▍      | 543/1563 [00:23<00:42, 24.17it/s]

batch 540 loss: 0.23176629915833474


Train, Epoch 11 / 20:  36%|███▌      | 555/1563 [00:23<00:40, 24.60it/s]

batch 550 loss: 0.22369619682431222


Train, Epoch 11 / 20:  36%|███▌      | 564/1563 [00:23<00:40, 24.72it/s]

batch 560 loss: 0.24219630062580108


Train, Epoch 11 / 20:  37%|███▋      | 573/1563 [00:24<00:40, 24.42it/s]

batch 570 loss: 0.3188256904482841


Train, Epoch 11 / 20:  37%|███▋      | 582/1563 [00:24<00:40, 24.30it/s]

batch 580 loss: 0.3440134175121784


Train, Epoch 11 / 20:  38%|███▊      | 594/1563 [00:25<00:40, 24.10it/s]

batch 590 loss: 0.26214728504419327


Train, Epoch 11 / 20:  39%|███▊      | 603/1563 [00:25<00:42, 22.34it/s]

batch 600 loss: 0.28900623694062233


Train, Epoch 11 / 20:  39%|███▉      | 612/1563 [00:25<00:45, 20.73it/s]

batch 610 loss: 0.3274493977427483


Train, Epoch 11 / 20:  40%|███▉      | 624/1563 [00:26<00:44, 20.87it/s]

batch 620 loss: 0.24233056828379632


Train, Epoch 11 / 20:  40%|████      | 633/1563 [00:26<00:44, 21.00it/s]

batch 630 loss: 0.17482297345995904


Train, Epoch 11 / 20:  41%|████      | 642/1563 [00:27<00:43, 21.14it/s]

batch 640 loss: 0.19195099622011186


Train, Epoch 11 / 20:  42%|████▏     | 654/1563 [00:27<00:39, 22.94it/s]

batch 650 loss: 0.21907906793057919


Train, Epoch 11 / 20:  42%|████▏     | 663/1563 [00:28<00:37, 23.89it/s]

batch 660 loss: 0.23286831974983216


Train, Epoch 11 / 20:  43%|████▎     | 675/1563 [00:28<00:36, 24.53it/s]

batch 670 loss: 0.29531966485083105


Train, Epoch 11 / 20:  44%|████▍     | 684/1563 [00:29<00:36, 24.07it/s]

batch 680 loss: 0.2627111241221428


Train, Epoch 11 / 20:  44%|████▍     | 693/1563 [00:29<00:35, 24.36it/s]

batch 690 loss: 0.27438763827085494


Train, Epoch 11 / 20:  45%|████▌     | 705/1563 [00:30<00:35, 24.36it/s]

batch 700 loss: 0.33133906200528146


Train, Epoch 11 / 20:  46%|████▌     | 714/1563 [00:30<00:34, 24.31it/s]

batch 710 loss: 0.2593557730317116


Train, Epoch 11 / 20:  46%|████▋     | 723/1563 [00:30<00:34, 24.47it/s]

batch 720 loss: 0.24094868898391725


Train, Epoch 11 / 20:  47%|████▋     | 732/1563 [00:31<00:34, 24.02it/s]

batch 730 loss: 0.2323174722492695


Train, Epoch 11 / 20:  48%|████▊     | 744/1563 [00:31<00:34, 23.82it/s]

batch 740 loss: 0.26196445152163506


Train, Epoch 11 / 20:  48%|████▊     | 753/1563 [00:32<00:33, 24.03it/s]

batch 750 loss: 0.22130397111177444


Train, Epoch 11 / 20:  49%|████▉     | 765/1563 [00:32<00:33, 24.16it/s]

batch 760 loss: 0.31711421981453897


Train, Epoch 11 / 20:  50%|████▉     | 774/1563 [00:32<00:32, 24.17it/s]

batch 770 loss: 0.32584182918071747


Train, Epoch 11 / 20:  50%|█████     | 783/1563 [00:33<00:32, 24.14it/s]

batch 780 loss: 0.30702437460422516


Train, Epoch 11 / 20:  51%|█████     | 795/1563 [00:33<00:31, 24.48it/s]

batch 790 loss: 0.28235761225223543


Train, Epoch 11 / 20:  51%|█████▏    | 804/1563 [00:34<00:31, 24.04it/s]

batch 800 loss: 0.32653856128454206


Train, Epoch 11 / 20:  52%|█████▏    | 813/1563 [00:34<00:31, 24.16it/s]

batch 810 loss: 0.32621958404779433


Train, Epoch 11 / 20:  53%|█████▎    | 825/1563 [00:35<00:30, 24.27it/s]

batch 820 loss: 0.2878663897514343


Train, Epoch 11 / 20:  53%|█████▎    | 834/1563 [00:35<00:30, 24.11it/s]

batch 830 loss: 0.21581324189901352


Train, Epoch 11 / 20:  54%|█████▍    | 843/1563 [00:35<00:29, 24.26it/s]

batch 840 loss: 0.3570961892604828


Train, Epoch 11 / 20:  55%|█████▍    | 855/1563 [00:36<00:29, 24.39it/s]

batch 850 loss: 0.30905754417181014


Train, Epoch 11 / 20:  55%|█████▌    | 864/1563 [00:36<00:28, 24.57it/s]

batch 860 loss: 0.2995870426297188


Train, Epoch 11 / 20:  56%|█████▌    | 873/1563 [00:37<00:28, 24.05it/s]

batch 870 loss: 0.2941503018140793


Train, Epoch 11 / 20:  56%|█████▋    | 882/1563 [00:37<00:28, 24.00it/s]

batch 880 loss: 0.3012147396802902


Train, Epoch 11 / 20:  57%|█████▋    | 894/1563 [00:37<00:29, 22.62it/s]

batch 890 loss: 0.2710065692663193


Train, Epoch 11 / 20:  58%|█████▊    | 903/1563 [00:38<00:30, 21.69it/s]

batch 900 loss: 0.28317693173885344


Train, Epoch 11 / 20:  58%|█████▊    | 912/1563 [00:38<00:30, 21.41it/s]

batch 910 loss: 0.2703711912035942


Train, Epoch 11 / 20:  59%|█████▉    | 924/1563 [00:39<00:30, 20.72it/s]

batch 920 loss: 0.267990930378437


Train, Epoch 11 / 20:  60%|█████▉    | 933/1563 [00:39<00:30, 20.69it/s]

batch 930 loss: 0.18753359243273734


Train, Epoch 11 / 20:  60%|██████    | 945/1563 [00:40<00:27, 22.54it/s]

batch 940 loss: 0.22795361168682576


Train, Epoch 11 / 20:  61%|██████    | 954/1563 [00:40<00:25, 23.80it/s]

batch 950 loss: 0.2980655744671822


Train, Epoch 11 / 20:  62%|██████▏   | 963/1563 [00:41<00:24, 24.22it/s]

batch 960 loss: 0.2632740348577499


Train, Epoch 11 / 20:  62%|██████▏   | 972/1563 [00:41<00:24, 24.02it/s]

batch 970 loss: 0.2971124403178692


Train, Epoch 11 / 20:  63%|██████▎   | 984/1563 [00:41<00:23, 24.41it/s]

batch 980 loss: 0.29686368703842164


Train, Epoch 11 / 20:  64%|██████▎   | 993/1563 [00:42<00:23, 24.57it/s]

batch 990 loss: 0.27144216299057006


Train, Epoch 11 / 20:  64%|██████▍   | 1002/1563 [00:42<00:23, 24.18it/s]

batch 1000 loss: 0.24389251917600632


Train, Epoch 11 / 20:  65%|██████▍   | 1014/1563 [00:43<00:23, 23.84it/s]

batch 1010 loss: 0.20283059179782867


Train, Epoch 11 / 20:  65%|██████▌   | 1023/1563 [00:43<00:22, 24.23it/s]

batch 1020 loss: 0.2586612209677696


Train, Epoch 11 / 20:  66%|██████▌   | 1032/1563 [00:43<00:22, 23.83it/s]

batch 1030 loss: 0.23232620358467101


Train, Epoch 11 / 20:  67%|██████▋   | 1044/1563 [00:44<00:21, 24.27it/s]

batch 1040 loss: 0.2832019865512848


Train, Epoch 11 / 20:  67%|██████▋   | 1053/1563 [00:44<00:20, 24.30it/s]

batch 1050 loss: 0.27525814175605773


Train, Epoch 11 / 20:  68%|██████▊   | 1062/1563 [00:45<00:20, 24.11it/s]

batch 1060 loss: 0.23061230033636093


Train, Epoch 11 / 20:  69%|██████▊   | 1074/1563 [00:45<00:20, 24.19it/s]

batch 1070 loss: 0.3301143802702427


Train, Epoch 11 / 20:  69%|██████▉   | 1083/1563 [00:46<00:19, 24.18it/s]

batch 1080 loss: 0.3730453997850418


Train, Epoch 11 / 20:  70%|███████   | 1095/1563 [00:46<00:19, 24.42it/s]

batch 1090 loss: 0.31857939511537553


Train, Epoch 11 / 20:  71%|███████   | 1104/1563 [00:46<00:18, 24.49it/s]

batch 1100 loss: 0.26291652023792267


Train, Epoch 11 / 20:  71%|███████   | 1113/1563 [00:47<00:18, 24.08it/s]

batch 1110 loss: 0.295639805495739


Train, Epoch 11 / 20:  72%|███████▏  | 1122/1563 [00:47<00:18, 24.42it/s]

batch 1120 loss: 0.2184773415327072


Train, Epoch 11 / 20:  73%|███████▎  | 1134/1563 [00:48<00:17, 24.47it/s]

batch 1130 loss: 0.21679804623126983


Train, Epoch 11 / 20:  73%|███████▎  | 1143/1563 [00:48<00:17, 24.29it/s]

batch 1140 loss: 0.2085447683930397


Train, Epoch 11 / 20:  74%|███████▍  | 1155/1563 [00:49<00:16, 24.30it/s]

batch 1150 loss: 0.21889216899871827


Train, Epoch 11 / 20:  74%|███████▍  | 1164/1563 [00:49<00:16, 24.15it/s]

batch 1160 loss: 0.32359103113412857


Train, Epoch 11 / 20:  75%|███████▌  | 1173/1563 [00:49<00:16, 24.12it/s]

batch 1170 loss: 0.29879705905914306


Train, Epoch 11 / 20:  76%|███████▌  | 1182/1563 [00:50<00:16, 23.08it/s]

batch 1180 loss: 0.21288446113467216


Train, Epoch 11 / 20:  76%|███████▋  | 1194/1563 [00:50<00:17, 21.20it/s]

batch 1190 loss: 0.2414627842605114


Train, Epoch 11 / 20:  77%|███████▋  | 1203/1563 [00:51<00:17, 21.03it/s]

batch 1200 loss: 0.31736005991697314


Train, Epoch 11 / 20:  78%|███████▊  | 1212/1563 [00:51<00:17, 20.58it/s]

batch 1210 loss: 0.24593316167593002


Train, Epoch 11 / 20:  78%|███████▊  | 1224/1563 [00:52<00:16, 20.89it/s]

batch 1220 loss: 0.2838814467191696


Train, Epoch 11 / 20:  79%|███████▉  | 1233/1563 [00:52<00:15, 21.73it/s]

batch 1230 loss: 0.3008334919810295


Train, Epoch 11 / 20:  80%|███████▉  | 1245/1563 [00:53<00:13, 23.66it/s]

batch 1240 loss: 0.24826244115829468


Train, Epoch 11 / 20:  80%|████████  | 1254/1563 [00:53<00:12, 24.38it/s]

batch 1250 loss: 0.290898996591568


Train, Epoch 11 / 20:  81%|████████  | 1263/1563 [00:53<00:12, 24.54it/s]

batch 1260 loss: 0.23071626275777818


Train, Epoch 11 / 20:  82%|████████▏ | 1275/1563 [00:54<00:11, 24.15it/s]

batch 1270 loss: 0.24435800686478615


Train, Epoch 11 / 20:  82%|████████▏ | 1284/1563 [00:54<00:11, 24.36it/s]

batch 1280 loss: 0.2194340605288744


Train, Epoch 11 / 20:  83%|████████▎ | 1293/1563 [00:55<00:11, 24.05it/s]

batch 1290 loss: 0.2689445048570633


Train, Epoch 11 / 20:  83%|████████▎ | 1305/1563 [00:55<00:10, 24.42it/s]

batch 1300 loss: 0.2619119767099619


Train, Epoch 11 / 20:  84%|████████▍ | 1314/1563 [00:55<00:10, 24.61it/s]

batch 1310 loss: 0.2508157230913639


Train, Epoch 11 / 20:  85%|████████▍ | 1323/1563 [00:56<00:09, 24.27it/s]

batch 1320 loss: 0.31566028371453286


Train, Epoch 11 / 20:  85%|████████▌ | 1335/1563 [00:56<00:09, 24.51it/s]

batch 1330 loss: 0.27051454707980155


Train, Epoch 11 / 20:  86%|████████▌ | 1344/1563 [00:57<00:09, 23.79it/s]

batch 1340 loss: 0.24151060730218887


Train, Epoch 11 / 20:  87%|████████▋ | 1353/1563 [00:57<00:08, 24.24it/s]

batch 1350 loss: 0.28685968518257143


Train, Epoch 11 / 20:  87%|████████▋ | 1365/1563 [00:58<00:08, 24.44it/s]

batch 1360 loss: 0.19976541846990586


Train, Epoch 11 / 20:  88%|████████▊ | 1374/1563 [00:58<00:07, 24.13it/s]

batch 1370 loss: 0.21583594977855683


Train, Epoch 11 / 20:  88%|████████▊ | 1383/1563 [00:58<00:07, 24.42it/s]

batch 1380 loss: 0.28699431419372556


Train, Epoch 11 / 20:  89%|████████▉ | 1395/1563 [00:59<00:06, 24.15it/s]

batch 1390 loss: 0.2989678680896759


Train, Epoch 11 / 20:  90%|████████▉ | 1404/1563 [00:59<00:06, 24.24it/s]

batch 1400 loss: 0.19784628674387933


Train, Epoch 11 / 20:  90%|█████████ | 1413/1563 [01:00<00:06, 24.22it/s]

batch 1410 loss: 0.26097561866045


Train, Epoch 11 / 20:  91%|█████████ | 1425/1563 [01:00<00:05, 24.39it/s]

batch 1420 loss: 0.2753059171140194


Train, Epoch 11 / 20:  92%|█████████▏| 1434/1563 [01:00<00:05, 24.40it/s]

batch 1430 loss: 0.3536524802446365


Train, Epoch 11 / 20:  92%|█████████▏| 1443/1563 [01:01<00:05, 23.63it/s]

batch 1440 loss: 0.26317253783345224


Train, Epoch 11 / 20:  93%|█████████▎| 1455/1563 [01:01<00:04, 24.15it/s]

batch 1450 loss: 0.3157318487763405


Train, Epoch 11 / 20:  94%|█████████▎| 1464/1563 [01:02<00:04, 24.10it/s]

batch 1460 loss: 0.33051311075687406


Train, Epoch 11 / 20:  94%|█████████▍| 1473/1563 [01:02<00:03, 23.16it/s]

batch 1470 loss: 0.22971197925508022


Train, Epoch 11 / 20:  95%|█████████▍| 1482/1563 [01:03<00:03, 21.27it/s]

batch 1480 loss: 0.25397451370954516


Train, Epoch 11 / 20:  96%|█████████▌| 1494/1563 [01:03<00:03, 21.31it/s]

batch 1490 loss: 0.31184437572956086


Train, Epoch 11 / 20:  96%|█████████▌| 1503/1563 [01:04<00:02, 21.24it/s]

batch 1500 loss: 0.28904558196663854


Train, Epoch 11 / 20:  97%|█████████▋| 1512/1563 [01:04<00:02, 20.83it/s]

batch 1510 loss: 0.2956298887729645


Train, Epoch 11 / 20:  98%|█████████▊| 1524/1563 [01:04<00:01, 22.17it/s]

batch 1520 loss: 0.36977295726537707


Train, Epoch 11 / 20:  98%|█████████▊| 1533/1563 [01:05<00:01, 23.41it/s]

batch 1530 loss: 0.3126778706908226


Train, Epoch 11 / 20:  99%|█████████▉| 1545/1563 [01:05<00:00, 24.16it/s]

batch 1540 loss: 0.2697662964463234


Train, Epoch 11 / 20:  99%|█████████▉| 1554/1563 [01:06<00:00, 24.44it/s]

batch 1550 loss: 0.2522380091249943


Train, Epoch 11 / 20: 100%|██████████| 1563/1563 [01:06<00:00, 23.47it/s]


batch 1560 loss: 0.2817234680056572


Test, Epoch 11 / 20: 100%|██████████| 1563/1563 [00:30<00:00, 51.52it/s]


Epoch 11, loss: 0.46772162197753786, accuracy: 0.80768


Train, Epoch 12 / 20:   1%|          | 15/1563 [00:00<01:02, 24.59it/s]

batch 10 loss: 0.263494274020195


Train, Epoch 12 / 20:   2%|▏         | 24/1563 [00:00<01:03, 24.37it/s]

batch 20 loss: 0.24898182451725007


Train, Epoch 12 / 20:   2%|▏         | 33/1563 [00:01<01:03, 24.15it/s]

batch 30 loss: 0.32885917723178865


Train, Epoch 12 / 20:   3%|▎         | 45/1563 [00:01<01:02, 24.25it/s]

batch 40 loss: 0.2582186579704285


Train, Epoch 12 / 20:   3%|▎         | 54/1563 [00:02<01:01, 24.50it/s]

batch 50 loss: 0.18944051153957844


Train, Epoch 12 / 20:   4%|▍         | 63/1563 [00:02<01:02, 24.15it/s]

batch 60 loss: 0.3008414924144745


Train, Epoch 12 / 20:   5%|▍         | 72/1563 [00:02<01:01, 24.15it/s]

batch 70 loss: 0.2323281466960907


Train, Epoch 12 / 20:   5%|▌         | 84/1563 [00:03<01:08, 21.68it/s]

batch 80 loss: 0.3390603303909302


Train, Epoch 12 / 20:   6%|▌         | 93/1563 [00:04<01:12, 20.35it/s]

batch 90 loss: 0.28374746814370155


Train, Epoch 12 / 20:   7%|▋         | 102/1563 [00:04<01:13, 19.99it/s]

batch 100 loss: 0.22320349961519242


Train, Epoch 12 / 20:   7%|▋         | 114/1563 [00:05<01:10, 20.55it/s]

batch 110 loss: 0.2996709100902081


Train, Epoch 12 / 20:   8%|▊         | 123/1563 [00:05<01:08, 21.09it/s]

batch 120 loss: 0.32608342692255976


Train, Epoch 12 / 20:   9%|▊         | 135/1563 [00:05<00:59, 23.86it/s]

batch 130 loss: 0.24342100620269774


Train, Epoch 12 / 20:   9%|▉         | 144/1563 [00:06<00:58, 24.24it/s]

batch 140 loss: 0.22224150486290456


Train, Epoch 12 / 20:  10%|▉         | 153/1563 [00:06<00:57, 24.48it/s]

batch 150 loss: 0.2934186220169067


Train, Epoch 12 / 20:  11%|█         | 165/1563 [00:07<00:56, 24.90it/s]

batch 160 loss: 0.3497927702963352


Train, Epoch 12 / 20:  11%|█         | 174/1563 [00:07<00:56, 24.50it/s]

batch 170 loss: 0.23879795148968697


Train, Epoch 12 / 20:  12%|█▏        | 183/1563 [00:07<00:55, 24.71it/s]

batch 180 loss: 0.2492368847131729


Train, Epoch 12 / 20:  12%|█▏        | 195/1563 [00:08<00:54, 24.89it/s]

batch 190 loss: 0.30291578322649004


Train, Epoch 12 / 20:  13%|█▎        | 204/1563 [00:08<00:54, 24.82it/s]

batch 200 loss: 0.2751310169696808


Train, Epoch 12 / 20:  14%|█▎        | 213/1563 [00:09<00:54, 24.60it/s]

batch 210 loss: 0.2301620677113533


Train, Epoch 12 / 20:  14%|█▍        | 225/1563 [00:09<00:53, 24.78it/s]

batch 220 loss: 0.23569302409887313


Train, Epoch 12 / 20:  15%|█▍        | 234/1563 [00:09<00:53, 24.76it/s]

batch 230 loss: 0.295914401113987


Train, Epoch 12 / 20:  16%|█▌        | 243/1563 [00:10<00:54, 24.30it/s]

batch 240 loss: 0.15683582425117493


Train, Epoch 12 / 20:  16%|█▋        | 255/1563 [00:10<00:54, 23.97it/s]

batch 250 loss: 0.2578962229192257


Train, Epoch 12 / 20:  17%|█▋        | 264/1563 [00:11<00:53, 24.19it/s]

batch 260 loss: 0.2472493201494217


Train, Epoch 12 / 20:  17%|█▋        | 273/1563 [00:11<00:54, 23.65it/s]

batch 270 loss: 0.30060082376003266


Train, Epoch 12 / 20:  18%|█▊        | 285/1563 [00:12<00:52, 24.47it/s]

batch 280 loss: 0.2987502679228783


Train, Epoch 12 / 20:  19%|█▉        | 294/1563 [00:12<00:51, 24.55it/s]

batch 290 loss: 0.23229895308613777


Train, Epoch 12 / 20:  19%|█▉        | 303/1563 [00:12<00:51, 24.54it/s]

batch 300 loss: 0.2850616693496704


Train, Epoch 12 / 20:  20%|██        | 315/1563 [00:13<00:50, 24.76it/s]

batch 310 loss: 0.2277831181883812


Train, Epoch 12 / 20:  21%|██        | 324/1563 [00:13<00:50, 24.44it/s]

batch 320 loss: 0.2620170138776302


Train, Epoch 12 / 20:  21%|██▏       | 333/1563 [00:14<00:49, 24.69it/s]

batch 330 loss: 0.2122371256351471


Train, Epoch 12 / 20:  22%|██▏       | 345/1563 [00:14<00:48, 24.92it/s]

batch 340 loss: 0.31309452950954436


Train, Epoch 12 / 20:  23%|██▎       | 354/1563 [00:14<00:49, 24.56it/s]

batch 350 loss: 0.27013824582099916


Train, Epoch 12 / 20:  23%|██▎       | 363/1563 [00:15<00:49, 24.28it/s]

batch 360 loss: 0.30701556652784345


Train, Epoch 12 / 20:  24%|██▍       | 372/1563 [00:15<00:53, 22.20it/s]

batch 370 loss: 0.20343379583209753


Train, Epoch 12 / 20:  25%|██▍       | 384/1563 [00:16<00:52, 22.51it/s]

batch 380 loss: 0.242204799503088


Train, Epoch 12 / 20:  25%|██▌       | 393/1563 [00:16<00:51, 22.54it/s]

batch 390 loss: 0.2978051468729973


Train, Epoch 12 / 20:  26%|██▌       | 402/1563 [00:17<00:52, 21.95it/s]

batch 400 loss: 0.23896907269954681


Train, Epoch 12 / 20:  26%|██▋       | 414/1563 [00:17<00:52, 21.68it/s]

batch 410 loss: 0.27924626022577287


Train, Epoch 12 / 20:  27%|██▋       | 423/1563 [00:18<00:50, 22.76it/s]

batch 420 loss: 0.19479108415544033


Train, Epoch 12 / 20:  28%|██▊       | 435/1563 [00:18<00:46, 24.20it/s]

batch 430 loss: 0.2722854010760784


Train, Epoch 12 / 20:  28%|██▊       | 444/1563 [00:18<00:45, 24.34it/s]

batch 440 loss: 0.3040534943342209


Train, Epoch 12 / 20:  29%|██▉       | 453/1563 [00:19<00:45, 24.63it/s]

batch 450 loss: 0.3031201273202896


Train, Epoch 12 / 20:  30%|██▉       | 465/1563 [00:19<00:44, 24.75it/s]

batch 460 loss: 0.24220195040106773


Train, Epoch 12 / 20:  30%|███       | 474/1563 [00:20<00:44, 24.54it/s]

batch 470 loss: 0.23740807622671128


Train, Epoch 12 / 20:  31%|███       | 483/1563 [00:20<00:44, 24.50it/s]

batch 480 loss: 0.26713065393269064


Train, Epoch 12 / 20:  31%|███▏      | 492/1563 [00:20<00:43, 24.59it/s]

batch 490 loss: 0.32381973415613174


Train, Epoch 12 / 20:  32%|███▏      | 504/1563 [00:21<00:43, 24.51it/s]

batch 500 loss: 0.192180860042572


Train, Epoch 12 / 20:  33%|███▎      | 513/1563 [00:21<00:42, 24.56it/s]

batch 510 loss: 0.3204019881784916


Train, Epoch 12 / 20:  34%|███▎      | 525/1563 [00:22<00:42, 24.70it/s]

batch 520 loss: 0.24505698531866074


Train, Epoch 12 / 20:  34%|███▍      | 534/1563 [00:22<00:41, 24.70it/s]

batch 530 loss: 0.28540133535861967


Train, Epoch 12 / 20:  35%|███▍      | 543/1563 [00:22<00:41, 24.65it/s]

batch 540 loss: 0.20698428377509118


Train, Epoch 12 / 20:  36%|███▌      | 555/1563 [00:23<00:40, 24.85it/s]

batch 550 loss: 0.26153875067830085


Train, Epoch 12 / 20:  36%|███▌      | 564/1563 [00:23<00:40, 24.79it/s]

batch 560 loss: 0.31627253741025924


Train, Epoch 12 / 20:  37%|███▋      | 573/1563 [00:24<00:40, 24.45it/s]

batch 570 loss: 0.2587984465062618


Train, Epoch 12 / 20:  37%|███▋      | 582/1563 [00:24<00:39, 24.55it/s]

batch 580 loss: 0.28381369784474375


Train, Epoch 12 / 20:  38%|███▊      | 594/1563 [00:24<00:40, 23.95it/s]

batch 590 loss: 0.29497011452913285


Train, Epoch 12 / 20:  39%|███▊      | 603/1563 [00:25<00:39, 24.36it/s]

batch 600 loss: 0.3271564394235611


Train, Epoch 12 / 20:  39%|███▉      | 615/1563 [00:25<00:38, 24.81it/s]

batch 610 loss: 0.16464970149099828


Train, Epoch 12 / 20:  40%|███▉      | 624/1563 [00:26<00:38, 24.68it/s]

batch 620 loss: 0.2120918668806553


Train, Epoch 12 / 20:  40%|████      | 633/1563 [00:26<00:37, 24.64it/s]

batch 630 loss: 0.3385248839855194


Train, Epoch 12 / 20:  41%|████      | 642/1563 [00:26<00:37, 24.42it/s]

batch 640 loss: 0.31454998776316645


Train, Epoch 12 / 20:  42%|████▏     | 654/1563 [00:27<00:36, 24.60it/s]

batch 650 loss: 0.280697825551033


Train, Epoch 12 / 20:  42%|████▏     | 663/1563 [00:27<00:37, 24.28it/s]

batch 660 loss: 0.22782923877239228


Train, Epoch 12 / 20:  43%|████▎     | 672/1563 [00:28<00:38, 22.86it/s]

batch 670 loss: 0.2841960027813911


Train, Epoch 12 / 20:  44%|████▍     | 684/1563 [00:28<00:39, 22.17it/s]

batch 680 loss: 0.27741633653640746


Train, Epoch 12 / 20:  44%|████▍     | 693/1563 [00:29<00:40, 21.34it/s]

batch 690 loss: 0.24614312052726744


Train, Epoch 12 / 20:  45%|████▍     | 702/1563 [00:29<00:39, 21.97it/s]

batch 700 loss: 0.31073001846671106


Train, Epoch 12 / 20:  46%|████▌     | 714/1563 [00:30<00:38, 21.95it/s]

batch 710 loss: 0.2815671980381012


Train, Epoch 12 / 20:  46%|████▋     | 723/1563 [00:30<00:36, 23.28it/s]

batch 720 loss: 0.26885263100266454


Train, Epoch 12 / 20:  47%|████▋     | 735/1563 [00:31<00:34, 24.15it/s]

batch 730 loss: 0.2439635943621397


Train, Epoch 12 / 20:  48%|████▊     | 744/1563 [00:31<00:33, 24.34it/s]

batch 740 loss: 0.28339422941207887


Train, Epoch 12 / 20:  48%|████▊     | 753/1563 [00:31<00:32, 24.78it/s]

batch 750 loss: 0.2824295602738857


Train, Epoch 12 / 20:  49%|████▉     | 765/1563 [00:32<00:32, 24.60it/s]

batch 760 loss: 0.2741357505321503


Train, Epoch 12 / 20:  50%|████▉     | 774/1563 [00:32<00:31, 24.85it/s]

batch 770 loss: 0.26515912264585495


Train, Epoch 12 / 20:  50%|█████     | 783/1563 [00:32<00:31, 24.87it/s]

batch 780 loss: 0.2712324447929859


Train, Epoch 12 / 20:  51%|█████     | 795/1563 [00:33<00:31, 24.76it/s]

batch 790 loss: 0.26129529625177383


Train, Epoch 12 / 20:  51%|█████▏    | 804/1563 [00:33<00:30, 24.80it/s]

batch 800 loss: 0.2954793617129326


Train, Epoch 12 / 20:  52%|█████▏    | 813/1563 [00:34<00:30, 24.70it/s]

batch 810 loss: 0.23153107687830926


Train, Epoch 12 / 20:  53%|█████▎    | 825/1563 [00:34<00:29, 24.85it/s]

batch 820 loss: 0.2726473338901997


Train, Epoch 12 / 20:  53%|█████▎    | 834/1563 [00:35<00:29, 24.71it/s]

batch 830 loss: 0.33402438163757325


Train, Epoch 12 / 20:  54%|█████▍    | 843/1563 [00:35<00:29, 24.57it/s]

batch 840 loss: 0.2687734842300415


Train, Epoch 12 / 20:  55%|█████▍    | 855/1563 [00:35<00:28, 24.82it/s]

batch 850 loss: 0.2355851337313652


Train, Epoch 12 / 20:  55%|█████▌    | 864/1563 [00:36<00:28, 24.51it/s]

batch 860 loss: 0.3925945725291967


Train, Epoch 12 / 20:  56%|█████▌    | 873/1563 [00:36<00:27, 24.75it/s]

batch 870 loss: 0.24855958819389343


Train, Epoch 12 / 20:  57%|█████▋    | 885/1563 [00:37<00:27, 24.83it/s]

batch 880 loss: 0.2369513414800167


Train, Epoch 12 / 20:  57%|█████▋    | 894/1563 [00:37<00:27, 24.54it/s]

batch 890 loss: 0.28423466980457307


Train, Epoch 12 / 20:  58%|█████▊    | 903/1563 [00:37<00:27, 24.38it/s]

batch 900 loss: 0.22875353321433067


Train, Epoch 12 / 20:  59%|█████▊    | 915/1563 [00:38<00:26, 24.67it/s]

batch 910 loss: 0.31056902930140495


Train, Epoch 12 / 20:  59%|█████▉    | 924/1563 [00:38<00:25, 24.61it/s]

batch 920 loss: 0.2858146607875824


Train, Epoch 12 / 20:  60%|█████▉    | 933/1563 [00:39<00:25, 24.72it/s]

batch 930 loss: 0.31283093690872193


Train, Epoch 12 / 20:  60%|██████    | 945/1563 [00:39<00:25, 24.72it/s]

batch 940 loss: 0.19840763211250306


Train, Epoch 12 / 20:  61%|██████    | 954/1563 [00:39<00:24, 24.70it/s]

batch 950 loss: 0.2683972179889679


Train, Epoch 12 / 20:  62%|██████▏   | 963/1563 [00:40<00:24, 24.59it/s]

batch 960 loss: 0.27425457537174225


Train, Epoch 12 / 20:  62%|██████▏   | 972/1563 [00:40<00:26, 22.65it/s]

batch 970 loss: 0.24437716007232665


Train, Epoch 12 / 20:  63%|██████▎   | 984/1563 [00:41<00:25, 22.66it/s]

batch 980 loss: 0.2882168017327785


Train, Epoch 12 / 20:  64%|██████▎   | 993/1563 [00:41<00:26, 21.33it/s]

batch 990 loss: 0.29094609916210173


Train, Epoch 12 / 20:  64%|██████▍   | 1002/1563 [00:42<00:26, 21.14it/s]

batch 1000 loss: 0.2515042372047901


Train, Epoch 12 / 20:  65%|██████▍   | 1014/1563 [00:42<00:27, 19.74it/s]

batch 1010 loss: 0.24418145716190337


Train, Epoch 12 / 20:  66%|██████▌   | 1025/1563 [00:43<00:23, 22.59it/s]

batch 1020 loss: 0.2920431077480316


Train, Epoch 12 / 20:  66%|██████▌   | 1034/1563 [00:43<00:22, 23.97it/s]

batch 1030 loss: 0.2846937797963619


Train, Epoch 12 / 20:  67%|██████▋   | 1043/1563 [00:43<00:21, 24.48it/s]

batch 1040 loss: 0.294845961779356


Train, Epoch 12 / 20:  67%|██████▋   | 1055/1563 [00:44<00:20, 24.77it/s]

batch 1050 loss: 0.28863326236605646


Train, Epoch 12 / 20:  68%|██████▊   | 1064/1563 [00:44<00:20, 24.60it/s]

batch 1060 loss: 0.25055170580744746


Train, Epoch 12 / 20:  69%|██████▊   | 1073/1563 [00:45<00:19, 24.75it/s]

batch 1070 loss: 0.28180351108312607


Train, Epoch 12 / 20:  69%|██████▉   | 1085/1563 [00:45<00:19, 24.57it/s]

batch 1080 loss: 0.19383491426706315


Train, Epoch 12 / 20:  70%|██████▉   | 1094/1563 [00:46<00:19, 24.44it/s]

batch 1090 loss: 0.2036989599466324


Train, Epoch 12 / 20:  71%|███████   | 1103/1563 [00:46<00:18, 24.42it/s]

batch 1100 loss: 0.2521874010562897


Train, Epoch 12 / 20:  71%|███████▏  | 1115/1563 [00:46<00:18, 24.63it/s]

batch 1110 loss: 0.1829136058688164


Train, Epoch 12 / 20:  72%|███████▏  | 1124/1563 [00:47<00:17, 24.84it/s]

batch 1120 loss: 0.27883847802877426


Train, Epoch 12 / 20:  72%|███████▏  | 1133/1563 [00:47<00:17, 24.76it/s]

batch 1130 loss: 0.24027328491210936


Train, Epoch 12 / 20:  73%|███████▎  | 1145/1563 [00:48<00:16, 24.96it/s]

batch 1140 loss: 0.31609226316213607


Train, Epoch 12 / 20:  74%|███████▍  | 1154/1563 [00:48<00:16, 24.73it/s]

batch 1150 loss: 0.293985116481781


Train, Epoch 12 / 20:  74%|███████▍  | 1163/1563 [00:48<00:16, 24.53it/s]

batch 1160 loss: 0.2070551857352257


Train, Epoch 12 / 20:  75%|███████▌  | 1175/1563 [00:49<00:15, 24.59it/s]

batch 1170 loss: 0.28952269852161405


Train, Epoch 12 / 20:  76%|███████▌  | 1184/1563 [00:49<00:15, 24.46it/s]

batch 1180 loss: 0.3148623965680599


Train, Epoch 12 / 20:  76%|███████▋  | 1193/1563 [00:50<00:15, 24.40it/s]

batch 1190 loss: 0.3118520364165306


Train, Epoch 12 / 20:  77%|███████▋  | 1205/1563 [00:50<00:14, 24.51it/s]

batch 1200 loss: 0.30063447877764704


Train, Epoch 12 / 20:  78%|███████▊  | 1214/1563 [00:50<00:14, 24.58it/s]

batch 1210 loss: 0.2522431932389736


Train, Epoch 12 / 20:  78%|███████▊  | 1223/1563 [00:51<00:13, 24.33it/s]

batch 1220 loss: 0.23695259913802147


Train, Epoch 12 / 20:  79%|███████▉  | 1232/1563 [00:51<00:13, 24.51it/s]

batch 1230 loss: 0.29964604005217554


Train, Epoch 12 / 20:  80%|███████▉  | 1244/1563 [00:52<00:12, 24.64it/s]

batch 1240 loss: 0.29332319647073746


Train, Epoch 12 / 20:  80%|████████  | 1253/1563 [00:52<00:12, 24.80it/s]

batch 1250 loss: 0.30806460604071617


Train, Epoch 12 / 20:  81%|████████  | 1262/1563 [00:52<00:12, 23.76it/s]

batch 1260 loss: 0.3222295567393303


Train, Epoch 12 / 20:  82%|████████▏ | 1274/1563 [00:53<00:12, 22.44it/s]

batch 1270 loss: 0.33901349157094957


Train, Epoch 12 / 20:  82%|████████▏ | 1283/1563 [00:53<00:12, 21.70it/s]

batch 1280 loss: 0.26255675554275515


Train, Epoch 12 / 20:  83%|████████▎ | 1292/1563 [00:54<00:13, 20.44it/s]

batch 1290 loss: 0.2421528786420822


Train, Epoch 12 / 20:  83%|████████▎ | 1304/1563 [00:54<00:12, 20.69it/s]

batch 1300 loss: 0.21263267397880553


Train, Epoch 12 / 20:  84%|████████▍ | 1313/1563 [00:55<00:11, 20.90it/s]

batch 1310 loss: 0.3096866376698017


Train, Epoch 12 / 20:  85%|████████▍ | 1322/1563 [00:55<00:10, 23.14it/s]

batch 1320 loss: 0.29172362610697744


Train, Epoch 12 / 20:  85%|████████▌ | 1334/1563 [00:56<00:09, 23.78it/s]

batch 1330 loss: 0.2973839834332466


Train, Epoch 12 / 20:  86%|████████▌ | 1343/1563 [00:56<00:09, 24.15it/s]

batch 1340 loss: 0.30638106912374496


Train, Epoch 12 / 20:  87%|████████▋ | 1352/1563 [00:56<00:08, 24.16it/s]

batch 1350 loss: 0.19077638387680054


Train, Epoch 12 / 20:  87%|████████▋ | 1364/1563 [00:57<00:08, 24.04it/s]

batch 1360 loss: 0.23458720073103906


Train, Epoch 12 / 20:  88%|████████▊ | 1373/1563 [00:57<00:07, 24.40it/s]

batch 1370 loss: 0.2327631637454033


Train, Epoch 12 / 20:  89%|████████▊ | 1385/1563 [00:58<00:07, 24.24it/s]

batch 1380 loss: 0.27567595839500425


Train, Epoch 12 / 20:  89%|████████▉ | 1394/1563 [00:58<00:06, 24.57it/s]

batch 1390 loss: 0.2467144913971424


Train, Epoch 12 / 20:  90%|████████▉ | 1403/1563 [00:59<00:06, 23.93it/s]

batch 1400 loss: 0.20150939449667932


Train, Epoch 12 / 20:  90%|█████████ | 1412/1563 [00:59<00:06, 24.19it/s]

batch 1410 loss: 0.28431950360536573


Train, Epoch 12 / 20:  91%|█████████ | 1424/1563 [00:59<00:05, 24.07it/s]

batch 1420 loss: 0.2878099426627159


Train, Epoch 12 / 20:  92%|█████████▏| 1433/1563 [01:00<00:05, 23.88it/s]

batch 1430 loss: 0.2561605617403984


Train, Epoch 12 / 20:  92%|█████████▏| 1445/1563 [01:00<00:04, 24.44it/s]

batch 1440 loss: 0.26834104359149935


Train, Epoch 12 / 20:  93%|█████████▎| 1454/1563 [01:01<00:04, 24.65it/s]

batch 1450 loss: 0.3769977897405624


Train, Epoch 12 / 20:  94%|█████████▎| 1463/1563 [01:01<00:04, 24.68it/s]

batch 1460 loss: 0.22233604341745378


Train, Epoch 12 / 20:  94%|█████████▍| 1475/1563 [01:02<00:03, 24.80it/s]

batch 1470 loss: 0.2534922629594803


Train, Epoch 12 / 20:  95%|█████████▍| 1484/1563 [01:02<00:03, 24.76it/s]

batch 1480 loss: 0.22216226309537887


Train, Epoch 12 / 20:  96%|█████████▌| 1493/1563 [01:02<00:02, 24.42it/s]

batch 1490 loss: 0.2175725817680359


Train, Epoch 12 / 20:  96%|█████████▋| 1505/1563 [01:03<00:02, 24.37it/s]

batch 1500 loss: 0.27281111404299735


Train, Epoch 12 / 20:  97%|█████████▋| 1514/1563 [01:03<00:01, 24.61it/s]

batch 1510 loss: 0.27637003622949124


Train, Epoch 12 / 20:  97%|█████████▋| 1523/1563 [01:03<00:01, 24.63it/s]

batch 1520 loss: 0.31677880585193635


Train, Epoch 12 / 20:  98%|█████████▊| 1535/1563 [01:04<00:01, 24.76it/s]

batch 1530 loss: 0.32748129665851594


Train, Epoch 12 / 20:  99%|█████████▉| 1544/1563 [01:04<00:00, 24.76it/s]

batch 1540 loss: 0.27549671977758405


Train, Epoch 12 / 20:  99%|█████████▉| 1553/1563 [01:05<00:00, 24.60it/s]

batch 1550 loss: 0.24734998643398284


Train, Epoch 12 / 20: 100%|██████████| 1563/1563 [01:05<00:00, 23.81it/s]


batch 1560 loss: 0.23815446421504022


Test, Epoch 12 / 20: 100%|██████████| 1563/1563 [00:30<00:00, 51.02it/s]


Epoch 12, loss: 0.4567181300011277, accuracy: 0.8138


Train, Epoch 13 / 20:   1%|          | 15/1563 [00:00<01:02, 24.77it/s]

batch 10 loss: 0.3833178475499153


Train, Epoch 13 / 20:   2%|▏         | 24/1563 [00:00<01:02, 24.75it/s]

batch 20 loss: 0.3447577267885208


Train, Epoch 13 / 20:   2%|▏         | 33/1563 [00:01<01:01, 24.82it/s]

batch 30 loss: 0.2548444136977196


Train, Epoch 13 / 20:   3%|▎         | 45/1563 [00:01<01:01, 24.83it/s]

batch 40 loss: 0.2989749975502491


Train, Epoch 13 / 20:   3%|▎         | 54/1563 [00:02<01:01, 24.69it/s]

batch 50 loss: 0.2850365832448006


Train, Epoch 13 / 20:   4%|▍         | 63/1563 [00:02<01:00, 24.77it/s]

batch 60 loss: 0.2717067927122116


Train, Epoch 13 / 20:   5%|▍         | 75/1563 [00:03<01:00, 24.74it/s]

batch 70 loss: 0.2849646478891373


Train, Epoch 13 / 20:   5%|▌         | 84/1563 [00:03<00:59, 24.79it/s]

batch 80 loss: 0.2580584615468979


Train, Epoch 13 / 20:   6%|▌         | 93/1563 [00:03<00:59, 24.71it/s]

batch 90 loss: 0.1761720634996891


Train, Epoch 13 / 20:   7%|▋         | 102/1563 [00:04<01:00, 24.22it/s]

batch 100 loss: 0.28882693573832513


Train, Epoch 13 / 20:   7%|▋         | 114/1563 [00:04<00:58, 24.65it/s]

batch 110 loss: 0.3270749233663082


Train, Epoch 13 / 20:   8%|▊         | 123/1563 [00:05<00:59, 24.36it/s]

batch 120 loss: 0.22772933542728424


Train, Epoch 13 / 20:   9%|▊         | 135/1563 [00:05<00:57, 24.74it/s]

batch 130 loss: 0.17744909711182116


Train, Epoch 13 / 20:   9%|▉         | 144/1563 [00:05<00:57, 24.72it/s]

batch 140 loss: 0.27357256412506104


Train, Epoch 13 / 20:  10%|▉         | 153/1563 [00:06<00:58, 24.31it/s]

batch 150 loss: 0.27374621033668517


Train, Epoch 13 / 20:  10%|█         | 162/1563 [00:06<00:57, 24.53it/s]

batch 160 loss: 0.2558061555027962


Train, Epoch 13 / 20:  11%|█         | 174/1563 [00:07<01:03, 21.80it/s]

batch 170 loss: 0.24593160673975945


Train, Epoch 13 / 20:  12%|█▏        | 183/1563 [00:07<01:04, 21.26it/s]

batch 180 loss: 0.22947660237550735


Train, Epoch 13 / 20:  12%|█▏        | 192/1563 [00:08<01:05, 21.08it/s]

batch 190 loss: 0.2587231777608395


Train, Epoch 13 / 20:  13%|█▎        | 204/1563 [00:08<01:03, 21.51it/s]

batch 200 loss: 0.24571449607610701


Train, Epoch 13 / 20:  14%|█▎        | 213/1563 [00:09<01:04, 20.96it/s]

batch 210 loss: 0.1947976104915142


Train, Epoch 13 / 20:  14%|█▍        | 225/1563 [00:09<00:57, 23.42it/s]

batch 220 loss: 0.23855433166027068


Train, Epoch 13 / 20:  15%|█▍        | 234/1563 [00:09<00:54, 24.41it/s]

batch 230 loss: 0.21566095799207688


Train, Epoch 13 / 20:  16%|█▌        | 243/1563 [00:10<00:54, 24.34it/s]

batch 240 loss: 0.28216252028942107


Train, Epoch 13 / 20:  16%|█▋        | 255/1563 [00:10<00:53, 24.67it/s]

batch 250 loss: 0.28886147812008856


Train, Epoch 13 / 20:  17%|█▋        | 264/1563 [00:11<00:52, 24.76it/s]

batch 260 loss: 0.3017366588115692


Train, Epoch 13 / 20:  17%|█▋        | 273/1563 [00:11<00:51, 24.84it/s]

batch 270 loss: 0.23559602200984955


Train, Epoch 13 / 20:  18%|█▊        | 285/1563 [00:11<00:51, 24.74it/s]

batch 280 loss: 0.23200460150837898


Train, Epoch 13 / 20:  19%|█▉        | 294/1563 [00:12<00:51, 24.51it/s]

batch 290 loss: 0.29454015493392943


Train, Epoch 13 / 20:  19%|█▉        | 303/1563 [00:12<00:50, 24.74it/s]

batch 300 loss: 0.21574767008423806


Train, Epoch 13 / 20:  20%|█▉        | 312/1563 [00:13<00:50, 24.80it/s]

batch 310 loss: 0.1923902414739132


Train, Epoch 13 / 20:  21%|██        | 324/1563 [00:13<00:49, 24.79it/s]

batch 320 loss: 0.2852893531322479


Train, Epoch 13 / 20:  21%|██▏       | 333/1563 [00:13<00:49, 24.94it/s]

batch 330 loss: 0.26402916200459003


Train, Epoch 13 / 20:  22%|██▏       | 342/1563 [00:14<00:51, 23.87it/s]

batch 340 loss: 0.23620034754276276


Train, Epoch 13 / 20:  23%|██▎       | 354/1563 [00:14<00:49, 24.50it/s]

batch 350 loss: 0.2793830171227455


Train, Epoch 13 / 20:  23%|██▎       | 363/1563 [00:15<00:48, 24.57it/s]

batch 360 loss: 0.19307721853256227


Train, Epoch 13 / 20:  24%|██▍       | 375/1563 [00:15<00:48, 24.71it/s]

batch 370 loss: 0.21671565994620323


Train, Epoch 13 / 20:  25%|██▍       | 384/1563 [00:15<00:47, 24.70it/s]

batch 380 loss: 0.1724008873105049


Train, Epoch 13 / 20:  25%|██▌       | 393/1563 [00:16<00:47, 24.58it/s]

batch 390 loss: 0.253687322512269


Train, Epoch 13 / 20:  26%|██▌       | 405/1563 [00:16<00:46, 24.72it/s]

batch 400 loss: 0.23992013856768607


Train, Epoch 13 / 20:  26%|██▋       | 414/1563 [00:17<00:46, 24.59it/s]

batch 410 loss: 0.22928463034331797


Train, Epoch 13 / 20:  27%|██▋       | 423/1563 [00:17<00:46, 24.53it/s]

batch 420 loss: 0.20783228576183319


Train, Epoch 13 / 20:  28%|██▊       | 435/1563 [00:18<00:45, 24.70it/s]

batch 430 loss: 0.26486576348543167


Train, Epoch 13 / 20:  28%|██▊       | 444/1563 [00:18<00:45, 24.50it/s]

batch 440 loss: 0.21809327825903893


Train, Epoch 13 / 20:  29%|██▉       | 453/1563 [00:18<00:44, 24.69it/s]

batch 450 loss: 0.22215531542897224


Train, Epoch 13 / 20:  30%|██▉       | 462/1563 [00:19<00:46, 23.85it/s]

batch 460 loss: 0.3096035450696945


Train, Epoch 13 / 20:  30%|███       | 474/1563 [00:19<00:47, 22.87it/s]

batch 470 loss: 0.3217111587524414


Train, Epoch 13 / 20:  31%|███       | 483/1563 [00:20<00:46, 23.00it/s]

batch 480 loss: 0.27789740785956385


Train, Epoch 13 / 20:  31%|███▏      | 492/1563 [00:20<00:47, 22.68it/s]

batch 490 loss: 0.2501616783440113


Train, Epoch 13 / 20:  32%|███▏      | 504/1563 [00:21<00:48, 22.03it/s]

batch 500 loss: 0.23409328535199164


Train, Epoch 13 / 20:  33%|███▎      | 513/1563 [00:21<00:49, 21.17it/s]

batch 510 loss: 0.23475678265094757


Train, Epoch 13 / 20:  34%|███▎      | 525/1563 [00:22<00:45, 22.58it/s]

batch 520 loss: 0.23073672577738763


Train, Epoch 13 / 20:  34%|███▍      | 534/1563 [00:22<00:42, 23.99it/s]

batch 530 loss: 0.2582061685621738


Train, Epoch 13 / 20:  35%|███▍      | 543/1563 [00:22<00:41, 24.31it/s]

batch 540 loss: 0.2968145027756691


Train, Epoch 13 / 20:  36%|███▌      | 555/1563 [00:23<00:40, 24.86it/s]

batch 550 loss: 0.23163783103227614


Train, Epoch 13 / 20:  36%|███▌      | 564/1563 [00:23<00:41, 24.32it/s]

batch 560 loss: 0.2258426994085312


Train, Epoch 13 / 20:  37%|███▋      | 573/1563 [00:23<00:40, 24.65it/s]

batch 570 loss: 0.2613793984055519


Train, Epoch 13 / 20:  37%|███▋      | 585/1563 [00:24<00:39, 24.65it/s]

batch 580 loss: 0.2230544738471508


Train, Epoch 13 / 20:  38%|███▊      | 594/1563 [00:24<00:39, 24.48it/s]

batch 590 loss: 0.22558702379465104


Train, Epoch 13 / 20:  39%|███▊      | 603/1563 [00:25<00:39, 24.60it/s]

batch 600 loss: 0.28342230767011645


Train, Epoch 13 / 20:  39%|███▉      | 615/1563 [00:25<00:38, 24.63it/s]

batch 610 loss: 0.3023577481508255


Train, Epoch 13 / 20:  40%|███▉      | 624/1563 [00:26<00:38, 24.71it/s]

batch 620 loss: 0.25048239678144457


Train, Epoch 13 / 20:  40%|████      | 633/1563 [00:26<00:37, 24.76it/s]

batch 630 loss: 0.2472079239785671


Train, Epoch 13 / 20:  41%|████▏     | 645/1563 [00:26<00:37, 24.75it/s]

batch 640 loss: 0.25744440853595735


Train, Epoch 13 / 20:  42%|████▏     | 654/1563 [00:27<00:36, 24.72it/s]

batch 650 loss: 0.2815647184848785


Train, Epoch 13 / 20:  42%|████▏     | 663/1563 [00:27<00:37, 24.31it/s]

batch 660 loss: 0.28598037660121917


Train, Epoch 13 / 20:  43%|████▎     | 675/1563 [00:28<00:36, 24.43it/s]

batch 670 loss: 0.2813454821705818


Train, Epoch 13 / 20:  44%|████▍     | 684/1563 [00:28<00:36, 24.32it/s]

batch 680 loss: 0.21650710552930832


Train, Epoch 13 / 20:  44%|████▍     | 693/1563 [00:28<00:35, 24.60it/s]

batch 690 loss: 0.28128410764038564


Train, Epoch 13 / 20:  45%|████▌     | 705/1563 [00:29<00:34, 24.59it/s]

batch 700 loss: 0.22076397389173508


Train, Epoch 13 / 20:  46%|████▌     | 714/1563 [00:29<00:35, 24.21it/s]

batch 710 loss: 0.271130608022213


Train, Epoch 13 / 20:  46%|████▋     | 723/1563 [00:30<00:34, 24.55it/s]

batch 720 loss: 0.17084308415651323


Train, Epoch 13 / 20:  47%|████▋     | 735/1563 [00:30<00:33, 24.69it/s]

batch 730 loss: 0.22048618122935296


Train, Epoch 13 / 20:  48%|████▊     | 744/1563 [00:30<00:33, 24.26it/s]

batch 740 loss: 0.27614264041185377


Train, Epoch 13 / 20:  48%|████▊     | 753/1563 [00:31<00:33, 24.15it/s]

batch 750 loss: 0.2906257141381502


Train, Epoch 13 / 20:  49%|████▉     | 762/1563 [00:31<00:33, 23.66it/s]

batch 760 loss: 0.2511206120252609


Train, Epoch 13 / 20:  50%|████▉     | 774/1563 [00:32<00:35, 22.29it/s]

batch 770 loss: 0.2507318776100874


Train, Epoch 13 / 20:  50%|█████     | 783/1563 [00:32<00:35, 21.95it/s]

batch 780 loss: 0.2469298131763935


Train, Epoch 13 / 20:  51%|█████     | 792/1563 [00:33<00:35, 21.95it/s]

batch 790 loss: 0.3088237576186657


Train, Epoch 13 / 20:  51%|█████▏    | 804/1563 [00:33<00:36, 20.60it/s]

batch 800 loss: 0.2648245021700859


Train, Epoch 13 / 20:  52%|█████▏    | 813/1563 [00:34<00:35, 20.92it/s]

batch 810 loss: 0.2838293142616749


Train, Epoch 13 / 20:  53%|█████▎    | 825/1563 [00:34<00:32, 22.77it/s]

batch 820 loss: 0.2241753578186035


Train, Epoch 13 / 20:  53%|█████▎    | 834/1563 [00:35<00:30, 23.77it/s]

batch 830 loss: 0.23510611653327942


Train, Epoch 13 / 20:  54%|█████▍    | 843/1563 [00:35<00:29, 24.40it/s]

batch 840 loss: 0.2024427779018879


Train, Epoch 13 / 20:  55%|█████▍    | 855/1563 [00:35<00:28, 24.49it/s]

batch 850 loss: 0.2108479082584381


Train, Epoch 13 / 20:  55%|█████▌    | 864/1563 [00:36<00:28, 24.67it/s]

batch 860 loss: 0.21304268203675747


Train, Epoch 13 / 20:  56%|█████▌    | 873/1563 [00:36<00:28, 24.63it/s]

batch 870 loss: 0.26394313722848894


Train, Epoch 13 / 20:  57%|█████▋    | 885/1563 [00:37<00:27, 24.37it/s]

batch 880 loss: 0.2311612531542778


Train, Epoch 13 / 20:  57%|█████▋    | 894/1563 [00:37<00:27, 24.51it/s]

batch 890 loss: 0.24767320975661278


Train, Epoch 13 / 20:  58%|█████▊    | 903/1563 [00:37<00:26, 24.60it/s]

batch 900 loss: 0.29914626106619835


Train, Epoch 13 / 20:  59%|█████▊    | 915/1563 [00:38<00:26, 24.85it/s]

batch 910 loss: 0.28869383931159975


Train, Epoch 13 / 20:  59%|█████▉    | 924/1563 [00:38<00:25, 24.78it/s]

batch 920 loss: 0.2293805181980133


Train, Epoch 13 / 20:  60%|█████▉    | 933/1563 [00:39<00:25, 24.77it/s]

batch 930 loss: 0.3203867178410292


Train, Epoch 13 / 20:  60%|██████    | 945/1563 [00:39<00:24, 24.83it/s]

batch 940 loss: 0.19002872258424758


Train, Epoch 13 / 20:  61%|██████    | 954/1563 [00:39<00:24, 24.73it/s]

batch 950 loss: 0.26547401659190656


Train, Epoch 13 / 20:  62%|██████▏   | 963/1563 [00:40<00:24, 24.84it/s]

batch 960 loss: 0.21411362513899804


Train, Epoch 13 / 20:  62%|██████▏   | 975/1563 [00:40<00:23, 24.87it/s]

batch 970 loss: 0.24582748264074325


Train, Epoch 13 / 20:  63%|██████▎   | 984/1563 [00:41<00:23, 24.59it/s]

batch 980 loss: 0.29598632305860517


Train, Epoch 13 / 20:  64%|██████▎   | 993/1563 [00:41<00:23, 24.67it/s]

batch 990 loss: 0.23346520960330963


Train, Epoch 13 / 20:  64%|██████▍   | 1005/1563 [00:42<00:22, 24.95it/s]

batch 1000 loss: 0.27558272182941435


Train, Epoch 13 / 20:  65%|██████▍   | 1014/1563 [00:42<00:22, 24.79it/s]

batch 1010 loss: 0.2565056376159191


Train, Epoch 13 / 20:  65%|██████▌   | 1023/1563 [00:42<00:22, 24.33it/s]

batch 1020 loss: 0.28604199439287187


Train, Epoch 13 / 20:  66%|██████▌   | 1035/1563 [00:43<00:21, 24.42it/s]

batch 1030 loss: 0.327666175365448


Train, Epoch 13 / 20:  67%|██████▋   | 1044/1563 [00:43<00:21, 24.46it/s]

batch 1040 loss: 0.23097299486398698


Train, Epoch 13 / 20:  67%|██████▋   | 1053/1563 [00:43<00:20, 24.71it/s]

batch 1050 loss: 0.272315987944603


Train, Epoch 13 / 20:  68%|██████▊   | 1062/1563 [00:44<00:21, 23.18it/s]

batch 1060 loss: 0.22685377523303032


Train, Epoch 13 / 20:  69%|██████▊   | 1074/1563 [00:44<00:23, 20.96it/s]

batch 1070 loss: 0.21777333542704583


Train, Epoch 13 / 20:  69%|██████▉   | 1083/1563 [00:45<00:22, 21.20it/s]

batch 1080 loss: 0.2540401589125395


Train, Epoch 13 / 20:  70%|██████▉   | 1092/1563 [00:45<00:21, 22.39it/s]

batch 1090 loss: 0.20632897615432738


Train, Epoch 13 / 20:  71%|███████   | 1104/1563 [00:46<00:21, 21.49it/s]

batch 1100 loss: 0.3225956857204437


Train, Epoch 13 / 20:  71%|███████   | 1113/1563 [00:46<00:21, 21.13it/s]

batch 1110 loss: 0.2546506136655807


Train, Epoch 13 / 20:  72%|███████▏  | 1125/1563 [00:47<00:18, 23.29it/s]

batch 1120 loss: 0.21777338758111


Train, Epoch 13 / 20:  73%|███████▎  | 1134/1563 [00:47<00:17, 24.36it/s]

batch 1130 loss: 0.2603392772376537


Train, Epoch 13 / 20:  73%|███████▎  | 1143/1563 [00:48<00:16, 24.71it/s]

batch 1140 loss: 0.22933901958167552


Train, Epoch 13 / 20:  74%|███████▍  | 1155/1563 [00:48<00:16, 24.43it/s]

batch 1150 loss: 0.2825139328837395


Train, Epoch 13 / 20:  74%|███████▍  | 1164/1563 [00:48<00:16, 24.77it/s]

batch 1160 loss: 0.259818746894598


Train, Epoch 13 / 20:  75%|███████▌  | 1173/1563 [00:49<00:15, 24.85it/s]

batch 1170 loss: 0.251991818100214


Train, Epoch 13 / 20:  76%|███████▌  | 1185/1563 [00:49<00:15, 24.50it/s]

batch 1180 loss: 0.29489524513483045


Train, Epoch 13 / 20:  76%|███████▋  | 1194/1563 [00:50<00:15, 24.44it/s]

batch 1190 loss: 0.28753083050251005


Train, Epoch 13 / 20:  77%|███████▋  | 1203/1563 [00:50<00:14, 24.31it/s]

batch 1200 loss: 0.3825414929538965


Train, Epoch 13 / 20:  78%|███████▊  | 1215/1563 [00:50<00:14, 24.53it/s]

batch 1210 loss: 0.2554697148501873


Train, Epoch 13 / 20:  78%|███████▊  | 1224/1563 [00:51<00:14, 24.03it/s]

batch 1220 loss: 0.3582225613296032


Train, Epoch 13 / 20:  79%|███████▉  | 1233/1563 [00:51<00:13, 24.37it/s]

batch 1230 loss: 0.24253300502896308


Train, Epoch 13 / 20:  80%|███████▉  | 1245/1563 [00:52<00:12, 24.79it/s]

batch 1240 loss: 0.29773085117340087


Train, Epoch 13 / 20:  80%|████████  | 1254/1563 [00:52<00:12, 24.27it/s]

batch 1250 loss: 0.19439727663993836


Train, Epoch 13 / 20:  81%|████████  | 1263/1563 [00:52<00:12, 24.26it/s]

batch 1260 loss: 0.24826478883624076


Train, Epoch 13 / 20:  82%|████████▏ | 1275/1563 [00:53<00:11, 24.42it/s]

batch 1270 loss: 0.2823935478925705


Train, Epoch 13 / 20:  82%|████████▏ | 1284/1563 [00:53<00:11, 24.26it/s]

batch 1280 loss: 0.2867221042513847


Train, Epoch 13 / 20:  83%|████████▎ | 1293/1563 [00:54<00:11, 24.25it/s]

batch 1290 loss: 0.32728798389434816


Train, Epoch 13 / 20:  83%|████████▎ | 1305/1563 [00:54<00:10, 23.95it/s]

batch 1300 loss: 0.23993680402636527


Train, Epoch 13 / 20:  84%|████████▍ | 1314/1563 [00:55<00:10, 24.24it/s]

batch 1310 loss: 0.2153523437678814


# **3. Naive MoE that has the same token choosing strategy and MLP architecture as the fully vectorized MoE for comparison. Third part of the task.**